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ABSTRACT 


Demonstration of required reliability performanee levels prior to system fielding 
has remained a ehallenge for the Army, and in reeent years, the sueeess rate of systems 
aehieving their stated reliability performanee in operational tests has deelined. 
Realization of required reliability performance necessitates effective management 
strategies and techniques in order to reduce risks. Furthermore, managing reliability 
performance does not stop upon fielding and must be continually monitored and assessed 
for potential improvements and efficiencies in support of meeting Army readiness 
objectives. 

The objective of this research is to ascertain common management issues that 
many Program and Project Managers deal with concerning reliability, identify their root 
causes, and suggest potential methods for mitigating these risks. To gather these data, the 
researcher drew directly from experiences of programs within Program Executive Office 
for Intelligence, Electronic Warfare & Sensors (PEO lEW&S). The programs 
participating cover the full spectrum of Acquisition Category (ACAT) levels and cross all 
acquisition phases. Results show that the key to success resides in early identification of 
upfront cost-effective opportunities for improving reliability performance, and mitigation 
of associated risks during design, manufacturing development, test, and post-production. 
Predictability in the field is the desired end state. 
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I. INTRODUCTION 


A, BACKGROUND DISCUSSION 

The Army Vision calls for a strategically responsive force with the ability to put a 
combat capable brigade anywhere in the world in 96 hours; a division in 120 hours; and 
five divisions in 30 days. This equates to a need for high readiness levels for rapid 
deployment, and a significantly reduced logistics footprint in the battlespace without 
jeopardizing combat capability. One enabler to achieving this is highly reliable systems. 
Highly reliable systems are force effectiveness multipliers, as the resulting benefits 
contribute towards reduced maintenance times, increased system availability, reduced 
training and manpower, less spare parts, and a net reduction in total ownership costs 
(TOC) that equates to the freeing up of scarce funds needed for Army modernization.. 

Demonstration of required reliability performance levels prior to system fielding 
has remained a challenge for the Army. According to the Army Test and Evaluation 
Command (ATEC), the success rate for systems either in development or operational 
testing over a 5-year period from 1996 to 2000 was only 36%, with system operational 
test success rate with respect to reliability was only 20%. [Ref 1] Eailure to achieve 
reliability performance requirements at this late stage of development can have 
devastating impacts on a program, to include fielding delays, or fielding of a less than 
optimal solution, with resultant increased costs to address and retest problems later. 

The United States General Accounting Office (GAO) has addressed the issue of 
“late cycle chum”, the scramble to fix significant problems discovered late in a weapon 
systems development, and concluded that among other things, early testing to validate 
product knowledge is key. [Ref 2] Eikewise, there are many early, upfront opportunities 
in a program for addressing reliability. Eirst, requirements generation and the systems 
engineering process are areas where early influence can make a difference. Secondly, 
program planning and organizational management can emphasize a rigorous reliability 
process throughout the development phase. Eastly, incremental testing to ensure 
attainment of increasing levels of system maturity will ensure that systems operate in the 
field as intended. 
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B, OBJECTIVES AND PURPOSE OF THE RESEARCH 

Achievement of required reliability performance neeessitates effeetive 
management strategies and teehniques to address reliability risks over the course of a 
weapon system’s development and fielding. This research evaluates how weapon system 
reliability performance is managed in the aequisition proeess, and the ehallenges 
encountered by the Army in aehieving operational requirements in support of readiness 
objeetives. It evaluates all aspects of reliability management and ascertains where there 
are shortcomings, and provides recommendations for improvement. The objeetive is to 
determine how to best manage reliability, identify upfront eost-effective opportunities for 
improving reliability performance, and mitigate assoeiated risks during design, 
manufaeturing development, test, and post-production. Predictability in the field is the 
desired end state, which translates into inereased operational availability; profieient use 
of personnel and skills; realistic levels of spares and repair parts; and ultimately an 
efficient and effective logistics tail that enables the Army to rapidly deploy and sustain 
forees in any theater of operation. 

C. RESEARCH QUESTIONS 

The Primary researeh question is: 

What essential steps can a Program Manager take to better manage weapon 
system reliability requirements over a program’s life cyele, and how ean reliability 
performanee be maintained and/or improved onee the system is fielded? 

Subsidiary research questions are: 

1. What are the predominant underlying factors that contribute to reliability 
performanee in Army systems, and how ean a Program Manager (PM) 
mitigate risk in these areas? 

2. What are the current policies and regulations that govern reliability of 
weapon systems, and do they provide PMs with adequate guidance? 
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3. How does the Army address reliability performanee of a weapon system in 
he requirements generation process, and to what extent can a PM influence 
this process? 

4. How is reliability addressed in the system engineering process, and what 
technology, tools and techniques are available to ensure reliability of a 
system is "designed in" upfront? 

5. How has acquisition reform and the shift to performance based contracting 
impacted the reliability of weapon systems? 

6. To what extent does commercial industry differ in their approach towards 
product reliability, and can the Army leverage these best practices to 
improve performance in military systems? 

7. How is system reliability addressed as part of the test program, and what 
program strategies can a PM employ to ensure that a system will 
successfully pass reliability testing with a high level of confidence? 

8. How do PMs plan to manage and track reliability, and what metrics are 
useful for measuring reliability performance during various stages of 
system development? 

9. How does a PM contract and incentivize for reliability with industry, and 
are there potential areas for improvement? 

10. Once a system is fielded, how does a program office ensure reliability 
performance is maintained, and what further can be done to improve 
reliability performance of fielded systems? 


D. SCOPE, LIMITATIONS AND ASSUMPTIONS 

The scope of this research includes an evaluation of reliability management 
considerations from several aspects: 1) the requirements generation process and the 
interface with the User, 2) the Program Manager’s (PM) perspective during system 
development and test, 3) approaches to reliability growth, and 4) commercial best 
practices with Industry. Current policy and guidance regarding materiel developer 
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management responsibilities with respect to reliability is reviewed for adequacy. 
Ongoing Army reliability improvement initiatives are also reviewed, to include an 
assessment of current technology, tools, and techniques available to PMs to manage 
system reliability maturation prior to transition to production. 

This research is limited to an analysis of systems in various stages of the 
acquisition process that are managed within the Program Executive Office for 
Intelligence, Electronic Warfare & Sensor (PEO lEW&S). The analysis addresses 
management approaches of PEO lEW&S and its PMs with respect to reliability 
performance, common issues encountered by PMs and reasons why they occur, risk 
mitigation techniques, contracting approaches for reliability, and lessons learned. The 
analysis is limited to an assessment of reliability management and process issues, and 
does not specifically address commodity or technology driven reliability problems. 
Although this research is limited to reliability of sensors and electronics systems, it is 
assumed that the management challenges, issues, and potential solutions can apply to 
other types of weapon systems as well. 

E. METHODOLOGY 

The methodology used in this thesis research consists of 2 steps. The first step is 
to provide an overview of the contemporary reliability environment within the Army. 
Current policies and regulations that govern reliability of weapon systems are reviewed 
for adequacy with respect to guidance given to materiel developers. The requirements 
generation process and the systems engineering process are evaluated with respect to how 
reliability requirements are dealt with, and to what extent these early processes influence 
reliability success in a program. Acquisition reform, current Army reliability initiatives, 
commercial best practices, and contracting methods for reliability are evaluated by 
literature reviews and interviews with acquisition professionals. Program management 
techniques and metrics that measure reliability performance are also assessed in the same 
manner. A comprehensive literature review on the subject of reliability includes material 
and sources that include, but are not limited to; 
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1. DoD and Army publications 

2. Published academic research papers 

3. References, publications, and electronic media available at the Naval 
Postgraduate School (NPS) 

4. World Wide Web sources (DoD, commercial, academic) 

5. Interviews with School of Business and Public Policy faculty at NPS 

The second step entails an analysis of systems managed within the Program 
Executive Office for Intelligence, Electronic Warfare & Sensor (PEO lEW&S). This 
analysis includes systems in various stages of the acquisition process: Concept 
Technology Demonstration (CTD), System Development and Demonstration (SDD), 
Production & Deployment (P&D), and Operations & Support (O&S). Data gathering and 
analysis was conducted by personal interviews, telephone calls, emails, and through a 
reliability performance survey. Evaluation of systems in various stages of development 
and technical maturity provides a good cross-section of how reliability is managed across 
a program’s lifecycle. The analysis synthesizes various PM’s perspectives on managing 
reliability requirements; the coordination that is involved in dealing with the User, test 
community, and Industry partners; what are the common issues; reasons why they occur; 
and how these risks can be reduced. 

F. ORGANIZATION OF THE STUDY 

This thesis consists of five chapters. The first chapter is an introduction and 
provides the structure and lays the groundwork for the research methodology. Chapter II 
will define reliability and will provide background information as well as a discussion on 
policy and regulations regarding reliability. The status of reliability within the Army 
today will be addressed as well as current trends and issues concerning this important 
topic. 

Chapter III will provide background information on the systems managed by PEO 
lEW&S that are a part of this study, present the results of a reliability performance 
survey, and discuss program’s experiences with managing reliability. This will include 
relevant experiences regarding reliability in terms of developing valid requirements, 
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contracting, development and test, ehallenges during operational test, the impaet of 
aequisition reform, best practiees, and finally, maintaining reliability in the field. 

Chapter IV then analyzes and eompiles the key issues and ehallenges assoeiated 
with reliability, and diseusses risk mitigation teehniques and strategies for maximizing 
inherent reliability. Barriers assoeiated with aehieving stated reliability performanee are 
also be addressed. 

The final ehapter makes eonelusions and reeommendations, and provides answers 
to the primary and subsidiary researeh questions. Additionally, the final ehapter will 
suggest areas that require further researeh. 


G. BENEFITS OF RESEARCH 

This thesis is eondueted on behalf of PEO lEW&S and its PMs, and eould have 
broader Army benefits as well. The primary benefit of this study will be identifieation of 
poliey and program management issues with respeet to weapon system reliability, and 
reeommendations for areas of potential improvement. It is intended to direetly benefit 
any PM that is, or will be managing eomplex programs, by identifying potential pitfalls, 
providing lessons learned, and suggesting methods for managing and redueing the 
inherent risks associated with aehieving stated reliability performance requirements of 
weapon systems. Aehieving stated weapon system reliability requirements is a ehallenge, 
one that PEO lEW&S is constantly dealing with, espeeially with the eomplex, software¬ 
intensive systems that it fields. Many organizations and working groups are aggressively 
looking into methods to improve reliability, and with this study I intend to pull these 
pieees together to present the “bigger pieture”. By evaluating the eommon issues that 
many PMs deal with, identifying their underlying root eauses, and suggesting potential 
methods for mitigating these risks, it is my hope that this study will benefit eurrent and 
future PMs, and ultimately the soldier. 


6 



II. RELIABILITY OVERVIEW AND BACKGROUND 


A. INTRODUCTION 

This chapter provides the reader with baekground information on reliability 
management as it pertains to weapon systems in general, and within the framework of the 
defense aequisition process. To begin this ehapter, a number of reliability definitions and 
terms are addressed to provide a eommon frame of reference and establishes a general 
basis of understanding for subsequent discussions. Following that, six main areas are 
discussed. First, an examination of eurrent DoD and Army polieies, proeedures, and 
guidance regarding reliability is provided to establish the basis within whieh 
organizations must operate to manage reliability within a program. Seeond, how 
reliability fits within the framework of the aequisition process is reviewed. Third, 
methods for managing reliability performance are addressed. Fourth, a eomparison of 
commereial vs. military reliability differences is provided. Fifth, the “cost” of reliability 
is discussed. And finally, this ehapter coneludes with an examination of the status of 
reliability trends and issues within the Army today. 


B, RELIABILITY DEFINED 

It is not surprising that the terminology used for reliability is nonstandard, and 
tends to vary depending on the Service and/or system. Metries employed in most 
engineering diseiplines are carefully defined and eontrolled in terms of method of 
measurement, and there is generally a universal agreement on their definitions. On the 
other hand, reliability, maintainability, and supportability fields use metries that are 
somewhat speeialized rather than naturally defined. The 361-page book entitled. 
Reliability, Availability, and Maintainability (RAM) Dictionary, published by the 
American Society for Quality Control and considered the "Webster’s Dictionary" of 
RAM, illustrates this point. Moreover, there are in excess of 2000 reliability-related 
terms defined in doeuments reviewed thus far, many of whieh have similar meaning but 
different definitions. [Ref 3] It is important to note this because a clear understanding by 
all parties is required on what the reliability terms signify in requirements doeuments and 
in eontraet speeifications. 
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1, Select DoD Reliability Definitions and Measures 

Although now cancelled, MIL-STD-721C “Definition of Terms for Reliability 
and Maintainability” previously provided DoD and defense eontraetors with eommon 
definitions and terms. The Defense Systems Management College (DSMC) now 
provides a eomprehensive set of definitions regarding reliability, availability and 
maintainability. The following definitions are found in the DSMC Aequisition Logistics 
Guide; [Ref 4] 

Reliability . Reliability is the probability that an item will perform its 
intended funetion for a speeified interval under stated eonditions. In simple laymen terms, 
it is how long the system ean work. Mean Time Between Failure (MTBF) is eommonly 
used to define the total funetioning life of a population of an item during a speeifie 
measurement interval divided by the failures during that interval. 

Mission Reliability. Mission reliability is the probability that a system 
will perform mission-essential funetions for a period of time under the eonditions stated 
in the mission profile. In other words, it’s the probability that no failure severe enough to 
prevent satisfaetory mission aeeomplishment will oecur during the mission. 

Logisties Reliability. Logisties reliability is the probability that no 
eorreetive maintenanee or unseheduled supply demand will oeeur following the 
eompletion of a speeifie mission profile. Logistie reliability basieally traeks the rate at 
whieh failures eause logisties demands to be plaeed on the system, regardless of its effeet 
on the mission. 

Maintainability. Maintainability is the probability that if preseribed 
proeedures and resourees are used, an hem will be retained in, or restored to, a speeifie 
eondition within a given period. It is the inherent eharaeteristie of a finished design that 
determines the amount of maintenanee required to retain or restore the system into a 
speeified eondition. Correetive maintenanee ean be measured by Mean Time to Repair 
(MTTR); or, stated in more simple terms, how quickly and easily the system ean be fixed. 
Also, Mean Maintenanee Time (MMT) or Mean Time Between Maintenanee (MTBM) 
not only ineludes eorreetive (unseheduled) maintenanee but also aeeounts for preventive 
(seheduled) maintenanee. 
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Availability, Availability is based on the question, "Is the equipment 
available in a working eondition when it is needed?" Availability is defined as the 
probability that an item is in an operable and eommittable state at the start of a mission 
when the mission is ealled for at a random point in time. The User is most eoneemed 
about this parameter as it direetly refieets the readiness of the system. There are a 
number of types of definitions of availability, all based on a standard mathematieal 
relationship, with differing definitions of the terms "Up Time;" "Down Time;" and "Total 
Time". Operational Availability (Ao), eovers all time segments the equipment is 
intended to be operational, and is the most desirable form of availability to be used in 
helping assess a system’s potential under fielded eonditions. 

Inherent Reliability. Inherent reliability is the potential reliability of a 
system, and assumes an ideal operating and support environment. 

A few nuances are worth mentioning here. It should be noted that redundancy, a 
practiced reliability design technique, while usually an improvement to mission 
reliability, almost always has an adverse impact on logistic reliability. Table 1. contrasts 
the differences between the two. Another interesting point is that MTBM is considered a 
more logistically significant measure than MTBF as it captures both scheduled and 
unscheduled maintenance actions. 


LOGISTICS RELIABILITY 

MISSION RELIABILITY 

• Measure of system’s ability to operate 
without logistics support 

• Measure of system’s ability to complete 
mission 

• Recognizes effects of all occurrences 
that demand support without regard to 
effect on mission. 

• Considers only failures that cause 
mission abort. 

• Degraded by redundancy 

• Improved by redundancy 

• Usually equal to or lower than mission 
reliability 

• Usually higher than logistics reliability 
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Table 1. Characteristics of Reliability Performance 
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2, Army Reliability Deflnitions 

The following provide the Army perspective on reliability definitions from a 
logistics, test and overall mission perspective. 

a. AR 700-127, Integrated Logistics Support 

Reliability is a fundamental characteristic of a system expressed as the 
probability that an item will perform its intended functions for a specified time under 
stated conditions. [Ref 5] 

b. DA Pamphlet 73-5; Operational Test and Evaluation Guidelines 
Reliability deals with the assurance that a system will not encounter an 

unacceptable number of failures during operation (frequency of failure), and is generally 
expressed as an operational measure in terms of "Mean Time between Operational 
Mission Failure." [Ref 6] 

c. RAND Study for the Army on Mission Reliability for Future Forces 
Reliability is the probability that a piece of equipment will successfully 

perform its intended critical functions for a given duration measured in time or activity 
under specified conditions. [Ref 7] 

3. Commercial Definitions for Reliability 

The IEEE Reliability Society’s Standards Committee is working to develop a 
commercial standard to replace MIL-STD-785 “Reliability Program for Systems and 
Equipment Development and Production.” In reviewing other commercial reliability 
standards and in researching commercial websites on the subject, it was found that there 
is virtually no distinction between how the DoD and private industry define reliability. 

C. RELIABILITY POLICY, PROCEDURES AND GUIDANCE 

Truly reliable systems have far-reaching impacts that go beyond the system itself 
A reliable system will result in increased operational availability while requiring fewer 
spares, less personnel with specialized skills, and an overall reduction in the combat 
logistical footprint. Policies and regulations have been established to emphasize the 
importance of reliability and to ensure that we are striving towards this end. 
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1. DoD 5000.2-R, Mandatory Procedures for Major Defense Acquisition 
Programs 

DoD 5000.2-R states that as part of the aequisition strategy for a given program, 
the PM shall develop and doeument a support strategy for life-cycle sustainment and 
continuous improvement of product affordability, reliability, and supportability, while 
sustaining readiness. RAM activities described in DoD 5000.2-R are summarized below: 

• The PM shall establish RAM activities early in the acquisition cycle. 

• The PM shall develop RAM system requirements based on the 
Operational requirements Document (ORD) and Total Ownership Costs 
(TOC) considerations, and state them in quantifiable, operational terms 
that are measurable during development and operational test. 

• Reliability requirements shall address mission reliability and logistic 
reliability. 

• Availability requirements shall address the readiness of the system. 

• Maintainability requirements shall address servicing, preventive, and 
corrective maintenance. 

• The PM shall plan and execute RAM design, manufacturing development, 
and test activities so that the system elements, including software, used to 
demonstrate system performance before the production decision reflect the 
mature design. [Ref 8] 

2. AR 70-1, Army Acquisition Policy 

AR-70-1 implements DoD 5000.2-R and governs research, development, and 
acquisition, and life cycle management of Army materiel to satisfy approved Army 
requirements. The regulation places responsibility squarely on the shoulders of the PM to 
implement an effective reliability and maintainability (R&M) program: 

• The R&M program will be tailored in scope and content and be designed 
to ensure that the user operational reliability requirements will be met at 
confidence levels established by the user. 
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• The PM is to actively participate with the User to establish R&M and 
other system requirements. These efforts will justify the up-front 
investment in R&M design, engineering and test necessary to meet ORD 
requirements and if required, will justify the trade-off of R&M 
characteristics necessary to keep within established cost targets. 

• PMs are encouraged to utilize reliability growth planning tools and curves 
to evaluate progress towards meeting established R&M parameters. 
Intermediate program milestone thresholds and objectives should be 
developed from these curves. 

• PMs are to track fielded systems failure and repair histories starting at 
First Unit Equipped (FUE). This effort should focus on the identification 
of operating and support cost drivers that lead to improvements where 
they are cost effective. [Ref. 9] 

3, DA Pamphlet 70-3, Army Acquisition Procedures 

DA PAM 70-3 provides discretionary guidance on materiel acquisition 
management and does a fairly good job with respect to addressing procedural guidance 
on reliability and maintainability (R&M) requirements. It applies to all Army 
organizations that have responsibility for the development, acquisition, and support of 
Army materiel. The guidance covers aspects of R&M Requirements, R&M 
Management, R&M Engineering and Design, R&M Testing, and R&M and Assessment 
Integrated Process Team (IPT) procedures. [Ref. 10] 

D, RELIABILITY AND THE ACQUISITION PROCESS 

Managing reliability in a program starts by understanding the User’s system 
readiness and performance needs as part of the requirements generation process. 
Reliability performance should be continually assessed as part of an iterative process 
during development, test, and production, and on through fielding and sustainment. 
Reliability management requires constant attention and a reasonable approach, and there 
must be a balance. The life cycle costs of a weapon system can be exceedingly high if 
the reliability of a system is either excessive or inadequate. 
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1, Requirements Generation Process 

The Combat Developer (CBTDEV) develops the Operational Requirements 
Document (ORD) and hence is ultimately responsible for defining the requirements 
relative to the reliability of the system. Typically this is defined in terms of operational 
availability and mission duration needs. Reliability requirements development, however, 
is not done in a vacuum. Developing quantitative operational reliability requirements, 
like all other ORD requirements, is a collaborative process between the CBTDEV and the 
Materiel Developer (MATDEV) using Integrated Product Teams/Integrated Concept 
Teams (IPTs/ICTs). This process provides a balanced solution between the best estimate 
of what is required to meet the user’s effectiveness, suitability, and survivability needs, 
and that which is actually affordable and technically achievable within program funding, 
risk, and time constraints. 

ORD reliability requirements are developed in accordance with AR 71-9. Three 
key elements combine to define overall reliability performance requirements. A change 
to any of these elements is a change to the basic requirement and requires appropriate 
coordination and approval. 

1) Reliability parameters (such as Ao) and their numerical values. 
The analysis and rationale supporting the development of these parameters is 
documented by the CBTDEV. 

2) Operational Mode Summary/Mission Profiles (OMS/MP). The 
OMS/MP is a supporting document that describes the mix of wartime and 
peacetime missions in which the system is required to perform, and the conditions 
(climate, terrain, battlefield environment, etc.) under which the missions are to be 
performed. 

3) Eailure Definition and Scoring Criteria (EDSC). The EDSC is a 
living document that matures as the program and system configuration evolve. It 
defines the required functionality of the system and what constitutes a reliability 
failure. The EDSC also establishes a framework for classifying and charging test 
incidents. [Ref. 11] 
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2, Systems Engineering Process 

Given the trend towards development of increasingly complex weapon systems, 
reliability cannot be left as a matter of chance; it has to be consciously and proactively 
built into a system through good design and manufacturing practices. The starting point 
is the systems engineering process beginning with requirements definition and analyses, 
and the conduct of cost/benefit trade-off analyses to determine alternative requirements, 
allocations, and design solutions. 

a. Design Tools and Techniques 

Emphasis must be placed early on in the use of proper design tools and 
activities to “build in” reliability up front, rather than the rely on extensive “back end” 
testing and validation. Numerous reliability tools, methodologies and analysis techniques 
can be employed during the systems engineering process to ensure reliability 
requirements are realized. Effective application of these techniques can: reduce the need 
for reliability testing by achieving higher design reliability; reduce the need for costly 
fixes and upgrades; reduce system operations and support costs; and allow for more 
effective maintenance actions when failures do occur. The listing provided bellow is 
intended to give the reader a general understanding of some of these tools and techniques. 
The listing is not meant to be exhaustive or comprehensive in description. 

• Physics of Eailure (PoE) . PoE is a proactive design technique used for 
designing reliability into a system by identifying and understanding the 
physical processes and mechanisms of failure. The purpose of using PoE 
tools is to design out failures prior to test and fielding. Electronic 
applications can be conducted at the board and device level employing 
vibration, thermal, and fatigue analysis tools. Mechanical component 
applications include solid modeling, dynamics simulation, and finite 
element analysis tools used in support of determining component fatigue 
failure mechanisms. 

• Critical Items Eist/Analysis . Critical items are those requiring special 
attention due to complexity, application of state-of-the-art technology, 
high cost, single source, or single failure point components. Special 

controls are required for these items to reduce their inherent risk. 

14 



• Identification of Potential Reliability Problems . Known reliability 
problems (hardware/software, or procedural), their impacts, and proposed 
solutions or plans for resolution are identified in the design process. 

• Software Reliability Assessment . A software assessment of the 
contractor identifies the metrics that will be used to measure the 
“goodness” of the product software reliability development process. 

• Redundancy . Redundancy offers continued system operation given 
failure of one of the critical components/subsystems. Trade-offs to 
consider using this design technique are cost, increased maintenance, and 
size weight and power (SWAP) increases. 

• Variability Production Processes & Quality Assurance . This includes 
processes and activities that will control defects and reduce variability 
resulting from manufacturing and production. Examples include statistical 
process control (SPC), six sigma, Taguchi methods, and ISO 9000. 

• Parts Control Program . Parts control helps maintain/increase inherent 
system reliability through the use of preferred standard parts to minimize 
variation. It also can be utilize to take advantage of new more reliable 
technologies. 

• Allocation and Prediction . Reliability allocation is performed early on 
in the program and allows for trade-off studies to be performed in order to 
achieve the optimal combination of subsystem reliability in which meet 
overall system requirements. The normal starting point is use of historical 
baseline data with adjustments based on type of technology and usage 
rates applied. 

• FMECA, FRACAS, and ETA . The Failure Modes Effect & Criticality 
Analysis (FMECA) is a tool that is used to identify potential failure modes 
and their impact on the system. A Failure Reporting and Corrective 
Action System (FRACAS) is the process by which failures of an item are 
tracked; analysis conducted to determine root cause; and corrective actions 
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identified and implemented to reduee failure oeeurrenee. Fault Tree 
Analysis (FTA) is a top down model that graphieally depiets all known 
events or eombinations of events that ean oecur leading to a specifie 
undesirable event. [Ref 12] 

b. Disciplines Involved in Reliability Processes 
A number of engineering, management, and logistie support diseiplines 
eome together and play a vital role in meeting a system’s reliability objeetives. The types 
of expertise and timing required for different tasks vary and depend upon many faetors, 
e.g., type and eomplexity of design, mission profde, operational and support resourees 
and eonstraints, ete. Table 2 summarizes the types of expertise that are typieally involved 
in the reliability design of a system. [Ref 13] 
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Source: CPAT - Reliability Engineering, Air Force 

Table 2. Common Diseiplines Involved in Reliability 
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3. Testing and Evaluation 

Reliability testing serves a twofold purpose: 1) to mature the system, i.e. to reveal 
design and proeess deficieneies through reliability growth and pre-qualifieation testing so 
that correetive action may occur when it is least costly to fix; and 2) to determine 
compliance with the requirement through formal qualification or demonstration testing. 
Testing should compliment design work, not replace it and emphasis should be placed 
upon designing out failures well prior to formal reliability test events. Accelerated test 
strategies such as Highly Accelerated Life Testing (HALT) quickly aid in the 
identification of weak parts and provides for quicker maturation. Test, Analyze, Fix, Test 
(TAFT) strategies can also be effective as long as sufficient resources exist (test assets, 
schedule time, and dollars) to support overall program acquisition timelines. Reliability 
qualification or demonstrations tests and successful achievement of operational reliability 
requirements in the form of an operational test are required prior to production to 
demonstrate contractual compliance and operational suitability for fielding. 

Various contractor and government tests (both technical and operational) can be 
used to demonstrate compliance to contractual and operational reliability requirements. 
A partial listing provided below is provided. The listing is not meant to be exhaustive or 
comprehensive in description. 

• Environmental Testing . These types of tests are contractual 
qualification tests of the system’s ability to operate during and after 
exposure to environmental extremes and are typically conducted in 
environmental lab chambers. 

• Accelerated Testing . Accelerated testing techniques precipitate 
failure modes quickly by increasing the component or system’s stresses. 
To be cost effective, accelerated testing should be performed early in the 
system design. 

• Reliability Development/Growth Testing (RD/GT) . RD/GT is a test- 
analyze-fix-test (TAFT) method used to surface failure modes on 
prototypes and production systems/subsystems so that fixes or corrective 
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actions may be applied to mature reliability. Testing is eonducted using 
the normal OMS/MP expected to be seen by the system. 

• Reliability Qualification/Demonstration Testing . In RQT or RDT, a 
“fixed configuration” type test, i.e. no fixes allowed as in RD/GT, is 
eonducted to specifically demonstrate compliance with a reliability 
requirements. This type of testing can be eonducted prior to a produetion 
deeision, or post-produetion on systems from the first produetion lot to 
ensure the system has retained its inherent reliability in production. 

• Government Developmental Testing . These tests may take on forms 
of field environmental testing or tests to ensure achievement of technical 
performance, safety, supportability, durability and RAM. These tests may 
augment eontraetor system level integrated testing as well as operational 
testing. 

• Operational Testing . The deeisive test for reliability entails testing in 
an operational environment in accordanee with the system’s OMS/MP, 
with trained troops, using approved Army doetrine and tactics, techniques, 
and proeedures. 

• Early User TestfEUTVLimited User Test(LUT) . EEIT is an 
operationally oriented test conducted early in the aequisition proeess to 
gather data in support of a selection of a single system concept from 
multiple ones considered for continued development. This ean provide 
early insight on the reliability of a ehosen system. EEIT is an operational 
test used to verify fixes or to satisfy effectiveness, suitability and 
survivability issues from a prior operational test. Estimates of operational 
reliability may be obtained to support a low rate production decision. 

• Initial Operational Test . lOT is the pinnaele test event for the system 
and for reliability. It is here that ORD reliability requirements must be 
met in order to support a full rate produetion deeision. Data from other 
test events may be aggregated with lOT reliability data, given compliance 
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with three eriteria: 1) tests eondueted under similar environments; 2) same 
produetion configuration, and 3) homogenous failure rates between tests. 

• Follow-On Test (FOT) . Any deficiencies found during lOT, including 
those related to reliability, must be corrected. FOT serves the purpose of 
demonstrating those fixes so that the system can be declared effective, 
suitable, and survivable. [Ref 14] 

4, Maintaining Reliability of Fielded Systems 

A PM’s responsibility is not over once a system is fielded. As mentioned earlier 
in this chapter, per AR 70-1, PMs are to track fielded systems failure and repair histories 
starting at First Unit Equipped (FUE). There is a good reason for this. Regardless of 
prior test results, estimates, or contractor predictions concerning the reliability of a 
system, high readiness rates must be upheld, and to do this a PM must ensure that proven 
reliability measures of a system are maintained. Among quality metrics, reliability is one 
of the most difficult to monitor and control. Although reliability of a system is tested 
throughout the acquisition process, reliability can be truly and accurately assessed only 
after a system has been in the field for some time. This implies collection of reliability 
field data. In addition, a PM’s data collection efforts should focus on the identification of 
operating and support cost drivers with respect to reliability (or other aspects of the 
system for that matter) that can be improved upon via engineering changes and product 
improvements, as long as they are deemed cost effective. 

Eield data collection can provide information on warranty compliance as well as 
unresolved reliability issues from earlier operational testing. Of equal importance is the 
fact that these data will also serve as a historical baseline in support of the reliability 
requirements generation process for future systems. The Army measures reliability in 
the field by using specific, reportable, criteria to determine availability measures such as 
operational availability, or Aq and fully mission capable (PMC) rates. Systems are fully 
mission capable when they can perform all of their combat missions without endangering 
the lives of crew or operators. The terms ready, available, and full mission capable are 
often used to refer to the same status; equipment is on hand and able to perform its 
combat missions. 


19 



E, MANAGING RELIABILITY PERFORMANCE 

Part of the PM program office’s responsibilities entail performing timely and 
continuous assessments of progress towards achieving reliability performance 
requirements. This is accomplished with the use of appropriate phased testing to help 
measure and project reliability. Problem and failure reporting, tracking, analysis, and 
corrective action processes are utilized throughout the lifecycle of a program, with 
sufficient attention and resources allocated to this area. To help manage reliability 
activities throughout the development life cycle, the U.S. Army Materiel Systems 
Analysis Activity (AMSAA) has developed reliability growth methodologies for all 
phases of the process, from planning to tracking to projection. AMSAA’s Reliability 
Growth Handbook provides sound methodology for reliability growth concepts and is 
considered a good source for reliability best practices. [Ref 15] 

It is also important to motivate the contractor to maximize the inherent reliability 
of a system during development, so that costly fixes are not required later on. Contracts 
should be constructed that provide incentives to the contractor to proactively identify and 
fix reliability problems. There should be close coordination between the government 
program office and the contractor to ensure a balanced approach is achieved between 
system reliability and overall program requirements and objectives. 

1. Planning, Tracking, and Assessing Reliability Growth 

Reliability growth is an integral piece to achieving highly reliable systems and 
should be seriously considered for any significant development program, especially those 
that incorporate complex state of the art technologies. Reliability growth is the 
improvement in a reliability parameter over a period of time due to changes in product 
design or the manufacturing process. It occurs by surfacing failure modes and 
implementing effective corrective actions. The following benefits can be realized by the 
utilization of reliability growth management: 

• Finding Unforeseen Deficiencies 

• “Designing in” Improvement through Surfaced Problems 

• Reducing the Risk of Final Demonstration 

• Increasing the Probability of Meeting Objectives [Ref. 16] 
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According to AMSAA, reliability growth management consists of planning, 
evaluating and controlling the growth proeess. 

a. Reliability Planning 

Reliability growth planning integrates program sehedules, required levels 
of testing, the resourees available, and addresses the realism of the test program in 
achieving the requirements. A reliability growth program plan curve is construeted that 
quantifies interim reliability goals throughout the program. 

b. Reliability Growth Assessment 

It is essential that periodic assessments of reliability are made during the 
test program and compared to the planned reliability growth values so that emphasis can 
be placed where warranted. 

c. Controlling Reliability Growth 

Done properly, reliability growth allows for correetion of system 
deficiencies while there is still time to affeet the system design. The proeess ean be 
eontrolled by making appropriate decisions regarding timing of fixes with respect to the 
program schedule milestones. 

2. Contracting for Reliability 

Reliability objectives are translated into quantifiable and verifiable contraetual 
terms, and should also be traceable to operational requirements. Prior to the advent of 
military specifieations and standards reform in 1994, the work requirements for reliability 
engineering were usually described in a Statement of Work (SOW) task that required 
compliance with MIL-STD 785 '^Reliability Program for Systems and Equipment 
Development and Production. In February 1996, Mr. Gil Deeker, the Army Aequisition 
Executive at the time, issued policy on incorporating a performance-based approach to 
Reliability in Requests for Proposals (RFPs). A key change was that no “how to” 
reliability standardization documents were to be used. The poliey stated that: 

"Reliability requirements should be included in RFPs by specifying: (1) 
quantified reliability requirements and allowable uncertainties, (2) failure 
definitions and thresholds, (3) life-cycle usage conditions.''’ [Ref. 17] 
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Mr. Decker’s policy was institutionalized in the update to AR 70-1 in Jan 1998. 
AR 70-1 clarified several points of the AAE memo. “Allowable uncertainties” pertain to 
statistical risks; “failure definitions and thresholds” are defined in Failure Definition and 
Scoring Criteria (FDSC); and “life-cycle usage conditions” refer to the OMS/MP of the 
system. 

Reliability parameters expressed by operational users and ones specified in 
contractual documents take many different forms. User requirements are generally 
expressed in a variety of forms that include combinations of mission and logistics 
reliability, or they may combine reliability with maintainability in the form of 
availability. Conversion from commonly used operational terms such as mean-time- 
between-maintenance (MTBM) and mean-time-between-critical-failure (MTBCF) must 
be made to enable translation to parameters which can be specified in contracts and 
verified in testing. 


CONTRACTUAL RELIABILITY 

OPERATIONAL RELIABILITY 

• Used to define, measure and evaluate 

• Used to describe reliability 

contractor’s program 

performance when operated in planned 

• Derived from operational needs 

environment 

• Selected such that achieving them 

• Not used for contract reliability 

allows projected satisfaction of operational 

requirements (requires translation) 

reliability 

• Used to describe needed level of 

• Expressed in inherent values 

reliability performance 

• Accounts only for failure events subject 

• Include combined effects of item 

to contractor control 

design, quality, installation environment. 

• Includes only design and 
manufacturing characteristics 

maintenance policy, repair, etc. 

TvDical terms: 

Typical terms: 

• MTBF (mean-time-between-failures) 

• MTBM (meantime-between- 
maintenance) 

• Mission MTBF (sometimes also called 

• MTBD (meantime-between-demand) 

MTBCF) 

• MTBR (meantime-between removal) 

• MTBCF (meantime-between- critical- 
failure) 


Reliability Engineers Toolkit: Rome Laboratory 


Table 3. Contractual vs. Operational Reliability 
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F. COMMERCIAL VS. MILITARY RELIABILITY DIFFERENCES 

While there are numerous differenees between the needs of the military eustomer 
and those of the eommereial eustomer, the reliability needs of the military focus primarily 
on operational readiness (product performance on demand), operational longevity (long 
useful life vs. short life cycles), operational supportability (repair/replace vs. throwaway 
items), and operational robustness (satisfactory performance over environmental 
extremes. Table 4 provides an overview of the general differences between military and 
commercial customer needs. [Ref. 18] 
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Reliability Toolkit: Commercial Practices Edition 


Table 4. Characteristics of military vs. Commercial Needs 


Although commercial products are less complex than defense weapon systems in 
general, the extreme difference in reliability requirements is quite startling; the Army 
requires levels of reliability in the hundreds to thousands of hours, whereas the 
commercial sector in some instances is asking for millions of hours or years. An 
example is a commercial telephone switching equipment that has less than two hours of 
downtime in 40 years. Another example where similarly high reliability standards are in 
effect is at the National Aeronautics and Space Administration (NASA). Out of 
necessity, NASA has one of the most noted and perhaps best reliability programs in the 
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world. The space systems it builds simply must work, and so NASA demands that 
contractors develop reliable products that meet extremely stringent reliability guidelines. 
For example, for software requirements NASA uses the following definitions in terms of 
probability of failure P(f) during a one hour mission; 

a. Low Reliability: P(f) of greater than.OOl 

b. Moderate Reliability: P(f) of between .001 and .0000001 

c. Ultra Reliable; P(f) of less than .0000001 

Of course, to get to levels of reliability that are in the “ultra” range does not come cheap. 
Highly reliable systems, like anything else, come with very high price tags. 

G. THE COST OF RELIABILITY 

The “cost” implications of reliability are far-reaching. Systems that are highly 
reliable are not only force effectiveness multipliers; the collateral reliability benefits of 
reduced maintenance times, increased system availability, reduced training and 
manpower, and less spare parts in the inventory equates to a decreased logistical burden 
that has considerable impacts on life cycle cost reductions. 
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Figure 1. Impacts of Reliability on Life Cycle cost 
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H, CURRENT RELIABILITY TRENDS & STUDIES WITHIN THE ARMY 

According to the Army Test and Evaluation Command (ATEC), the sueeess rate 
for Army systems either in development or operational testing over a 5-year period from 
1996 to 2000 was only 36%, and of those failed tests, 61% failed to even aehieve half of 
their reliability requirement. System operational test (OT) success rate with respect to 
reliability was only 20%. [Ref 19] 
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Source: AEC Presentation to PEO lEW&S, 20 Sep 2001 

Army System Reliability Performance; 1996-2000 


The chart above represents operational test events that were used as the basis for 
demonstration of reliability requirements. All acquisition category (ACAT) levels are 
represented here. The types of OT events included; Eield exercises, lOTs, EOTs, EUTs, 
and combined DT/OT. Points above the diagonal achieved their reliability requirement 
during testing, while those below did not. 
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The issue of reliability performanee, or laek thereof, has been an interest and 
eoncern at all levels of the Army lately. To its credit, the Army has chartered a 
Reliability and Maintainability (RAM) Panel to look at these concerns, identify problems, 
and explore solutions. A number of Army Reliability Workshops, sponsored by the 
Assistant Secretary of the Army for Acquisition, Logistics, and Technology (ASA(ALT) 
and led by AMSAA, have been held over the past year to address shortfalls in the current 
process and enablers for improving the way the Army addresses reliability in the future. 
A number of sub-panels meet on a regular basis to focus on the following top-level 
reliability issues described below. The work of these panels is currently ongoing. 

• Adequacy of reliability and Maintainability Requirements 

• Contracting to Design in Reliability and Maintainability 

• Reliability Validation 

• Management Enforcement 

• Adequacy of Reliability and Maintainability Workforce 

• Field Systems Data 


A number of recent studies have taken a closer look at reliability performance in 
weapon systems. One such study is the Army Science Board’s FY2000 Summer Study, 
Technical and Tactical Opportunities for Revolutionary Advances in Rapidly Deployable 
Joint Ground Forces in the 2015-2025 Era. One of the focuses of this study is the 
Army’s Future Combat System (FCS), a key cornerstone of the Objective Force and 
Army Transformation Vision. The Support & Sustainment sub-panel recommended 
making “ultra-reliability” a Key Performance parameter (KPP) for FCS, and also went on 
to recommend increased use and reliance on Physics of Failure (PoF) techniques and 
emphasized the incorporation of embedded diagnostics/prognostics. Of importance to 
note is that the panel recommended mission reliability be a KPP for FCS, vice system 
reliability. [Ref. 20] 
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I, CHAPTER SUMMARY 

In this chapter, the researeher provided a broad deseriptive baekground on 
reliability and how it is managed today within the defense aequisition proeess. Policies 
and proeedures for ineorporating reliability within the management framework of 
aequisition programs were diseussed, as well as how reliability is addressed as part of an 
iterative proeess during development, test, and produetion, and on through fielding and 
sustainment. A pieture of the eontemporary reliability environment within the Army 
today was presented to set the stage for further review. It is evident based on reeent 
downward trends in reliability performanee test results that there needs to be better 
management of the reliability “risk” in programs. The Army has initiated several efforts 
to address the reliability problem and get systems “baek on traek”. 

The next ehapter presents results of a reliability performanee survey that identifies 
reliability management teehniques, issues, and methodologies employed by PM 
organizations within the Program Exeeutive Offiee for Intelligenee, Eleetronie Warfare & 
Sensor (PEO lEW&S). The survey ineluded systems in various stages of the aequisition 
proeess and thus provides a good eross-seetion of how reliability ean be managed aeross 
a program’s lifeeyele. 
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III. MANAGING WEAPON SYSTEM RELIABILITY 

PERFORMANCE 


A. INTRODUCTION 

This chapter identifies and diseusses a variety of issues, eommon praetiees, 
eoneerns, and real-world experienees of Projeet and Produet Managers as they relate to 
managing the reliability performanee of Army weapon systems. Data is presented on 
programs ranging from ACAT I to ACAT III systems that are in various stages of 
development and production, from Concept and Technology Demonstration (CTD) 
through production and Operations & Support (O&S). The data was gathered through 
several sourees; a reliability performanee survey that was provided to eaeh participating 
PM and program/project leader; interviews with program offiee personnel responsible for 
reliability testing; telephone ealls; and emails. A eopy of the survey that ineludes all of 
the questions and sub-questions is found in Appendix A. These questions were based on 
the literature review and the baekground researeh eondueted on reliability as deseribed in 
Chapter II. The questions were designed to draw out the praetiees employed by eaeh PM 
organization (PMO) on managing reliability performanee risks in their programs. 

This ehapter is organized around four main areas. First, the general methodology 
and proeess used in eondueting the survey is provided along with some basie 
demographics on the programs involved. Then, a eorporate overview of the participating 
organization is provided, along with a brief deseription of eaeh PM and the programs 
involved in the reliability survey. Next, survey question responses, grouped by eommon 
themes, are presented and summarized, and where appropriate, speeifie program 
experienees are provided to further illustrate key points made. Finally, ehapter 
eonelusions are presented. Note that the source for all tables found in this chapter is 
from the author, based on responses to the reliability performance survey. 


B, METHODOLOGY 

Surveys were distributed to eaeh PM organization via email with information 
regarding the objeetives of the survey, and instruetions for eompleting it. Survey 

respondents were typieally not the PM him/herself, and were either the Program/Projeet 
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Leader (PL), or someone who had program responsibility in engineering, quality, testing, 
or had specific reliability expertise that was part of their primary job duties in the PMO. 

1, Program Demographics 

A total of 18 programs from five PM organizations were asked to participate in 
the survey. The participation response was 100%. The programs participating cover the 
full spectrum of AC AT levels and cross all acquisition phases. This should provide a 
fairly representative cross-section of experiences with respect to weapon system 
reliability performance management. Table 5 generically summarizes the program 
demographics by depicting programs by phase, broken out by ACAT level. 


ACAT 

Level 

MSA 

MSB 

MSC 

O&S 

LRIP 

FRP 

ACAT I 




1 


ACAT II 



2 

1 

1 

ACAT III 

1 

2 

4 

4 

1 

Non ACAT 





1 


Table 5. ACAT and Acquisition Phase Demographics 


2, Survey Areas of Interest 

In all, there were 20 primary questions asked in the survey with some that had 
additional subparts. The surveys were developed to collect information on the eight main 
themes described below. 

• Management Approach to Reliability 

• Influencing Reliability Requirements 

• Contracting and Incentivizing for Reliability 

• “Designing-in” Reliability Upfront 

• Development and Operational Test Experiences 

• The Impact of Acquisition Streamlining and Downsizing 

• Commercial Practices 

• Maintaining and Improving Reliability in the Field 
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3, Data Presentation 

The subsequent seetions provide the data for this researeh and serve as the basis 
for analysis in Chapter IV. For the purposes of elarity, responses to the 20 survey 
questions are eategorized into eight main themes: 1) Management Approaeh to 
Reliability; 2) Influeneing Reliability Requirements; 3) Contraeting and Ineentivizing for 
Reliability; 4) “Designing-in” Reliability Upfront; 5) Development and Operational Test 
Experienees; 6) The Impaet of Aequisition Streamlining and Downsizing; 7) Commereial 
Praetiees; and 8) Maintaining and Improving Reliability in the Field. Colleetively, these 
eight themes eorrespond to issues addressed in the thesis researeh questions. 

Eaeh theme is generally laid out into four basie subparts. First, the purpose and 
objeetive of the survey question(s) within the main theme are addressed. Seeond, roll-up 
tables or paraphrased responses to survey questions are presented. Third, responses are 
summarized for the reader. Finally, a few illustrative examples of reliability program 
management experienees are provided as appropriate, to exemplify real-world challenges 
that PMs are often confronted with in dealing with reliability issues of weapon systems. 

C. PROGRAM EXECUTIVE OFFICE INTELLIGENCE, ELECTRONIC 

WARFARE AND SENSORS 

The Program Executive Office for Intelligence, Electronic Warfare and Sensors 
(PEO lEW&S) has responsibility for oversight and management of Army programs that 
provide critical and timely intelligence and sensor data at all echelons; to command and 
control systems at brigade level and above, to ground combat platforms, and down to the 
individual combat soldier. Its mission is “To field and insert state-of-the-art, 
interoperable sensor capabilities and products which enable the land component 
commander to control time, space and environment, while enhancing survivability and 
lethality through continuous technology evolution and warfighter focus. ” PEO lEW&S 
is the warfighter’s expert on the exploitation of the visual and non-visual electro¬ 
magnetic spectrum for intelligence, surveillance, reconnaissance and electronic warfare. 
Their core product line of sensor capabilities is based on signals intelligence, radar, laser, 
electro-optic, and infrared imaging technologies. Fielding relevant, reliable capabilities 
to the soldier is of paramount importance to PEO lEW&S, and is considered Job #1. 
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PEO lEW&S leads an organization consisting of four (06-level) Project 
Management Offices, one (06-level) Project Office, and two (05-level) direct-report 
Product Managers. Approximately two-thirds of all PEO lEW&S programs participated 
in the reliability survey, and many provided additional data to support this research. A 
brief description of the systems that participated is provided in the following sections. 

1. Project Manager Common Ground Station (CGS) 

PM CGS is responsible for systems that provide situational awareness and target 
information through command and control systems and ultimately to the end users. Two 
systems managed by PM CGS participated in the reliability survey. 

a. Common Ground Station 

The CGS is a tactical data processing and evaluation center that li nks 
multiple air and ground sensors to Tactical Operation Centers (TOCs) at Echelons Above 
Corps (EAC), Corp, Division, and Brigade. CGS integrates imagery and intelligence data 
into a single visual presentation of the battlefield, providing commanders with near real¬ 
time situational awareness. A good portion of CGS is designed with commercial off-the- 
shelf (COTS) components. CGS is currently in full rate production (ERP). 

b. Joint Tactical Terminal/Common Integrated Broadcast Service 
Modules 

The JTT/CIBS-M is a family of tactical terminals that provide critical 
intelligence and targeting information to battle managers, intelligence centers, air 
defenders, fire support elements and aviation nodes across all services. JTT/CIBS-M is 
currently in low rate initial production (ERIP). 

2. Project Manager Night Vision/Reconnaissance, Surveillance and 

Target Acquisition (NV/RSTA) 

PM NV/RSTA provides capabilities that enable commanders and their soldiers to 
conduct decisive operations at any time of the day or night. It is responsible for an 
extensive product line of sensor systems that employ a wide range of technologies to 
include electro-optical systems, image intensifiers, thermal infrared devices, radars, 
lasers, and multi-sensor suite systems. Nine systems managed by PM NV/RSTA 
participated in the reliability survey. 
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a. Second Generation Forward-Looking Infrared 

The SGF system provides ground eombat platforms such as the Ml 
Abrams, M2 Bradley, and the Long Range Advanced Scout Surveillance System with a 
common sensor for “own-the-night” operations. SGF is currently in FRP. 

b. Long Range Advanced Scout Surveillance System (LRAS3) 

The LRAS3 is mounted on a High Mobility Multipurpose Wheeled 
Vehicle (HMMWV) and provides real-time acquisition, target detection, recognition, and 
far target location information to the cavalry scout. LRAS3 is currently in FRP. 

c. Thermal Weapon Sight 

The TWS is a day/night thermal imaging device that and is mounted on 
individual and crew-served weapon systems. TWS comes in three configurations; light, 
medium, and heavy, and they are all in various stages of TRIP and FRP. 

d. Driver’s Vision Enhancer 

The DVE provides drivers of combat and tactical wheeled vehicles with a 
low-cost thermal imager that allows mobility in all weather, day or night, and in 
battlefield obscurants. DVE is currently in ERP. 

e. Lightweight Video Reconnaissance System 

The LVRS captures and transmits still frame images for use at higher 
echelons and is employed by surveillance and reconnaissance teams. EVRS, which 
consists primarily of COTS hardware and software, is currently in post-production 
operations and support (O&S) and is also undergoing several product improvements. 

f Lightweight Laser Designator Rangefinder 

The EEDR is a tripod-mounted day/night target designator with a digital 
target location capability and is used by fire support teams. EEDR is currently in ERIP. 

g. Image Intensification Systems 

Individual soldiers use various types of Night Vision Goggles (NVGs) in 
combat, combat support, and combat service support operations. The family of NVGs 
includes the standard AN/PVS-7D NVGs; the Monocular Night Vision Device (MNVD); 
and the Aviators Night Vision System (ANVIS). The image intensification technology 
used in NVGs has matured over the past two decades, and has increased performance and 
reliability with each new generation. All NVG systems are in ERP. 
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h. Profiler Meteorological Measurement System 

The Profiler MMS is the next generation meteoroiogieal system that 
provides weather predietion information to fire support systems. Profiler is eurrently in 
system design and development (SDD). 

L Synthetic Aperture Radar/Moving Target Indicator Payload 

The SAR/MTI payload will be the first of a series of advaneed sensor 
payloads to be developed for the Shadow Tactieal Unmanned Aerial Vehiele (TUAV) 
system. The SAR/MTI system reeently transitioned from its Advaneed Teehnology 
Demonstrator (ATD) phase and is presently in the early stages of SDD. 

3. Project Manager Signals Warfare (SW) 

PM SW provides overall management of Army ground and airborne eleetronie 
warfare and signals intelligenee eolleetion systems. Three systems managed by PM SW 
partieipated in the reliability survey. 

a. Aerial Common Sensor 

The ACS is the Army’s objective airborne Intelligence, Surveillance and 
Reconnaissance (ISR) system. ACS will eventually replace the legacy Guardrail 
Common Sensor and Aerial Reconnaissance Low systems and is in the early stages of 
Component Advanced Development (CAD). 

b. Guardrail Common Sensor 

The Guardrail Common Sensor is a Corps level airborne signals 
intelligence system that is currently fielded in several locations in the U.S., Europe, and 
Korea, and is in post-production O&S. 

c. Prophet 

The Prophet system is the Division and Armored Cavalry Regiment 
Commander’s principal ground-based signals intelligence and electronic warfare system. 
Prophet Block I is currently in FRP. 

4, Project Manager Tactical Unmanned Aerial Vehicle (TUAV) 

PM TUAV is designated as the Army’s centralized manager for tactical 
unmanned aerial vehicles. Two systems managed by PM TUAV participated in the 
reliability survey 
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a. Shadow Tactical Unmanned Aerial Vehicle 

The Shadow system provides the maneuver brigade eommander with near 
real-time RSTA, situational awareness, and battle damage assessment (BDA). Shadow is 
eurrently in TRIP. 

b. Hunter Tactical Unmanned Aerial Vehicle 

The Hunter system provides similar eapabilities as the Shadow but at the 
Division level. Hunter provides the Army with a training base and eontingency 
eapability and is eurrently in post-production O&S. 

5, Product Manager Combat Identiflcation (Cl) 

PM Cl programs address the need to minimize fratricide on the battlefield. Two 
systems managed by PM Cl participated in the reliability survey. 

a. Battlefield Combat Identification System 

The BCIS provides tactical ground combat platforms with a question-and- 
answer combat identification system. BCIS is currently in TRIP. 

b. Individual Combat Identification System 

The ICIDS is a dismounted soldier point-of-engagement fratricide 
prevention system. ICIDS is currently in TRIP. 


D, RELIABILITY MANAGEMENT WITHIN PEG lEW&S 

1. General 

PEG lEW&S is responsible for over 30 programs ranging from relatively simple 
Thermal Combat ID panels to large, complex systems such as Guardrail and the CGS. In 
the past year alone, during the period July 2000 to June 2001, PEG lEW&S and its PMs 
have fielded over 15,000 items. The ensuing sections contain responses from programs 
within PEG lEW&S to the reliability performance survey, augmented by some examples 
of specific program experiences on reliability management challenges. Before that data 
is presented, a brief examination of how PEG lEW&S maintains oversight in this area is 
warranted. 

2. Reliability Performance Oversight 

There are several methods in which PEG lEW&S maintains “corporate” oversight 
in the area of reliability performance of the weapon systems it manages. 
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a. Acquisition Program Baselines (APBs) 

Each program has an APB that defines the eost, sehedule, performance, 
and supportability measures that it must meet, with thresholds and objectives defined that 
serve as boundary parameters within whieh the PM operate. The APB serves as a 
“eontraet” of sorts between the PM and the Milestone Deeision Authority (MDA), whieh 
in many eases is the PEO. Reliability related parameters sueh as MTBE, Ao, MTTR, and 
MTBM exist for eaeh program either in the Performanee or Supportability seetions of the 
APB. The APB status of eaeh program is reviewed onee a quarter and at major reviews. 

b. Acquisition Decision Memorandums (ADMs) 

When a program reaehes a major milestone or experienees a signifieant 
ehange in its program parameters, the outeome is documented in an ADM. These ADMs 
doeument deeisions made by the MDA, and typieally inelude additional direetive 
statements that the PM must eomply with. A review of all ADMs for existing programs 
revealed that many ineluded statements and direetives related to aehieving or improving 
higher reliability levels for the programs. An ADM database traeking system has been 
established within PEO lEW&S that provides the status of all open aetions deseribed in 
program ADMs, to inelude those related to reliability. This database is periodically 
reviewed, with special attention given when a program is approaching its next milestone 
decision review. Several examples of PEO lEW&S ADMs are provided that plaee exit 
criteria, eonstraints, or follow-on actions related to reliability performanee. 

• “The PM will have the eontraetor identity the reliability baseline and 
their plan to integrate growth throughout the programs lifeeyele. The PM 
shall include a eontraetual ineentive strategy to faeilitate the same.” 

• “Complete RDGT with measurable results that demonstrate ORD 
threshold MTBOME of 2,200 hours.” 

• “...build sufficient quantities for system performanee, reliability and 
operational testing.” 

• “Demonstrate the eapability to have R&M that supports mission 
aeeomplishments in an operational environment.” 

• “The PM shall brief the PEO within 30 days of exereising the eontraet 
options to demonstrate how the PM will ensure reliability performanee of 
at least 500 hours MTBE.” CONSTRAINT: “no additional work is to 
commenee until the PM-Contractor addresses the proeess used by the 
eontaetor to demonstrate reliability required in the eontraet.” 
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c. Test and Evaluation Master Plans (TEMPs) 

The TEMP for each program is reviewed to ensure that appropriate 
resources are available to support the test program for a given system. The TEMP 
usually addresses how, when, and where reliability performance will be tested. 

d. Sustainment Cost Management Annex (SCMAs) 

The Sustainment Cost Management Annex (SCMA) is a document that 
describes a PM’s approach towards Total Ownership Cost (TOC) for a system. SCMAs 
are a living document, and are typically prepared as part of a program’s acquisition 
strategy. The SCMA identifies a program's top ten 0& S cost drivers, details plans and 
resources required to reduce these costs, and provides metrics to measure progress. 
Several programs within PEO lEW&S have specific strategies for reducing TOC through 
improvements in the reliability of their systems. 

e. Program Reviews 

The reliability performance progress and plans for improving inherent 
reliability of a system are addressed at every major review of a program. As a PM, when 
you show up at a review with the PEO, be prepared to answer the question “What are you 
doing and where are you at with achieving the stated reliability of your system?” 


E, MANAGEMENT APPROACH TO RELIABILITY 

Purpose : The first series of survey questions focused on how reliability 

performance and its associated risks are managed. These questions asked PMOs: 1) 
based on their actual experiences, what did they perceive to be the key factors that 
contribute towards reliability risk in a program, and how did they attempt to mitigate 
these risks; 2) how is reliability performance managed within a PMO in terms of roles 
and responsibilities, documentation, tracking progress, and reliability growth; and 3) the 
level of understanding of DoD and Army policy and guidance concerning reliability. 

1, Key Factors Contributing to Reliability Performance 

Objective : The first area of focus was intended to get right to the heart of the 
matter, that is, why do systems continue to struggle with reliability? Why do we often 
fail to meet required reliability goals? As part of an ongoing series of Army Reliability 
Workshops established by the Military Deputy ASA(ALT) to look at reliability concerns. 
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some common prevailing issues and concerns were identified by Army organizations 
regarding how reliability is addressed in the acquisition process. [Ref. 21] A “Top 10” 
list was developed and provided to all participants of the reliability performance survey 
to rank as they see fit, in order to gain better insights from those closest to the problem. 
Next, given these known or perceived risk areas, PMs were asked what kind of risk 
mitigation techniques do they employ to reduce these issues. 

a. Top Ten Army Reliability Management Issues 
The survey asked all participants to ra nk order what they felt were the 
“Top 10” reliability Army reliability issues, using the list developed by the Army RAM 
panel. Respondents were given the opportunity to nominate their own issues as well. 
Table 6 compiles all responses to provide an overall composite order of merit ranking. 

Survey Responses : 

"TOP 10" ARMY RELIABILITY ISSUES 

Poor reliability growth planning (test too late) 

Not aggressively "designing-in" reliability upfront 

Insufficient reliability testing to verify requirements 

Reliability is not a KPP 

Unrealistic reliability requirements/rationale 

Lack of qualified personnel in reliability management 

Inadequate policies and procedures 

Not designing sufficiently above requirement 

Contractors not using best commercial practices 

Not consistently improving reliability after fielding _ 

Table 6. Top “10” Army Reliability Issues 

Summary : The top three reliability problems as ranked by the 

respondents were: 1) Poor growth planning/testing too late; 2) Not aggressively 
“designing-in” reliability upfront; and 3) Insufficient reliability testing to verily 
requirements. These areas were clearly identified as especially problematic as at least 
half of all respondents choose these 3 problems as one of their top three issues. 
Interestingly though, each of the first seven ranked factors received at least one #1 vote. 
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b. Reliability Risk Mitigation Techniques 

The next following answers were in response the survey question, “What 
risk mitigation teehniques does your program employ that address system reliability 
performanee?” Answers are paraphrased below. 

Survey Responses : 

• We leverage other test events. For example, we eolleet reliability data 
when soldiers are training on the system. This gives us an opportunity to 
better assess performanee than just training data alone. 

• Our eontract has a hard requirement for Failure Analysis and 
Correetive Aetion and ineludes an essential Reliability Growth program. 

• We do Environmental Stress Sereening (ESS) and environmental 
testing to ring out early problems. RGDT is good for final eore system 
eertifieation. 

• Beeause of extremely low reliability indicators observed during 
Engineering & Manufacturing Development (EMD) we have implemented 
intense oversight of the reliability process to include ESS, HAET, and 
RQT. 

• Our program is in the early phases of Concept Exploration/Component 
Advanced Development. We use a Probability Consequences Screening 
model to identify risk management items. Its goal is to migrate high- 
risk/high-probability candidates to to a more manageable low-risk/low- 
probability level. 

• The program convenes regular failure review boards to address 
reiability failures as well as corrective actions. 

• Test early and often. Use HALT, RDGT, tear down audits, ESS. 

• A reliability llocation model is used for for subsystems 

Summary : There was slightly higher than a 50% response rate on this 

question, whereas most every other survey question received full attention. The primary 
methods and techniques for mitigating reliability performance risks include leveraging 
other testing to gain valuable reliability data, testing early and often, and use reliability 
growth to gain early knowledge and implement corrective action. 


2. Managing Reliability in Acquisition Programs 

Objective : The next series of answers were in response to the questions 
concerning how PMs manage reliability. The intent is to determine: a) how reliability 


39 



management is assigned in terms of roles and responsibilities within a PMO; b) if there is 
a proeess in plaee speeifieally for reliability management, and how is it formally 
documented; c) is there a reliability growth strategy in place; and d) what measures does 
management employ to continually assess reliability performance and progress. 

a. Roles and Responsibilities 

This question sought to determine how PM’s delegated responsibility for 
reliability activities within a program. If the reliability activities of a program are 
conducted within the context of an Integrated Product Team (IPT), responders were 
asked if the IPT was formally chartered. 

Survey Responses : 


Responsible for Reliability Within PMO 

Total 

% 

Chartered IPT? 

Y 

N 

PM 

1 

6% 


Project Leader 

1 

6% 

Systems Engineering Team Lead 

2 

10% 

Logistics/Supportability Team Lead 

3 

16% 

Test Team Lead 

1 

6% 

Reliability IPT 

9 

50% 

2 

7 

Prime Contractor 

0 

- 


No One Specifically 

1 

6% 

Table 7. Reliability Management 

Responsibility Within the PMO 


Summary : Responses varied across the board on how PMs delegate 

management responsibility with respect to reliability performance. None left it entirely 
up to the contractor, and 50% of the programs have a Reliability IPT with representation 
from many of the disciplines listed in the table above. Of the nine Reliability IPTs, only 
two have formal charters. 

b. Documenting a Program ’s Reliability Management Approach 
In order to provide visibility into the management and organizational 
structure of those responsible (on both the government and contractor side) for the 
conduct of reliability activities in a program, there should be definitive documentation on 
all reliability activities, functions, processes, test strategies, measurement/metrics, data 
collection, resources and timelines required to ensure reliability system maturation. 
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Each PMO was asked how is the system reliability program was formally documented 
within their program. Responses are provided in Table 8 below. 


Survey Responses : 


Reliability Documentation Within PMO 

Program 

Responses 

% of 

Programs 

Reliability Program Plan 

3 

17% 

Contract Statement of Work (SOW) 

12 

67% 

Test and Evaluation Master Plan (TEMP) 

6 

33% 

Single Acguisition Management Plan (SAMP) 

2 

11% 

No Formal Reliability Management Plan 

15 

83% 

Other 

6 

33% 


Table 8. Types of Reliability Documentation Within a PMO 


Summary : The majority of responses indicate that, although reliability 

is addressed throughout various program documentation, there is no one single, guiding 
document, e.g. a “Reliability Program Plan” that provides a comprehensive compendium 
of program reliability activities. It should be noted that there is no requirement for PMs 
to have such an overarching document, but some in fact do. Of 18 programs surveyed, 
83% (15 programs) had no formal Reliability Program Management Plan. Most rely on 
the contract SOW and the TEMP, or other documentation to address such things as how 
they intend to ensure reliability is treated as high priority objective, methodologies used 
to measure and project reliability, resources needed to execute the program, and future 
plans for monitoring reliability in the field. 

Illustrative Examples : 

1) Thermal Weapon Sight (TWS) . During the solicitation process, 
offerors were required to submit a Quality Validation Plan (QVP) outlining how they 
proposed to assure reliability and other specification requirements. This QVP became 
part of the contract after contract award. Because of a number of reliability problems 
experienced by TWS, a Reliability Assurance Plan was developed to address the 
management approach for assuring reliability is maintained throughout production. The 


41 















approach includes reliability testing, and development of metrics to track key 
performance subsyystems that directly effect reliability. 

c. Reliability Growth in a Program 

Reliability growth is the improvement in a reliability parameter over a 
period of time due to changes in product design or the manufacturing process. Some 
programs use a risk reduction method referred to as a Test-Analyze-Fix-Test (TAFT) as 
reliability growth, however, a structured reliability program is typically devised with 
specific interim reliability goals and test events. Managing reliability growth entails a 
systematic planning for reliability performance achievement as a function of time and 
other resources, and involves controlling the ongoing rate of achievement by reallocation 
of resources based on comparisons between planned and assessed reliability values. [Ref 
22] Reliability growth management techniques are typically employed on complex 
systems that use state-of-the-art technologies where the requirements for reliability, 
maintainability and other performance parameters are highly demanding. All survey 
participants were asked whether their program incorporates a reliability growth program 
(RGP). Where applicable, responses were further broken out in accordance with the 
reliability performance achieved during their reliability qualification Test (RQT) initial 
operational test (lOT). Survey responses are provided in Table 9 below. 

Survey Responses : 


Does Your Program 
Incorporate a Reliability 
Growth Program? 

Program 

Responses 

Passed Reliability 
Requirement in RQT or Initial 
OT? 

Y 

N 

Did Not 
Have Yet 

Yes 

33% 

3 

1 

2 

No 

67% 

2 

8 

2 

N/A 






Table 9. Reliability Growth Programs 


Summary : Reliability growth is an iterative design process. As the 

design matures, testing is performed and planned intervals to identify actual or potential 
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sources of failures. The intent is to gain knowledge and learn from early design mistakes, 
and then foeus on fixing these as early as possible. For PEO lEW&S, two-thirds of all 
programs (12 of 18) surveyed did not initially implement a reliability growth program. 
After experieneing problems in either RQT or lOTE, these numbers have generally 
reversed, with nearly two-thirds of all programs now employing some type of growth 
program. Note the high correlation of reliability-related problems during testing with 
those that did not initially incorporate a RGP. 

Illustrative Examples : 

1) BCIS RDGT . The BCIS program employed a RDGT strategy that took 
into aeeount faetors sueh as eonstraints based on available test hours, time to implement 
fixes, and availability of test assets. The program derived the number of test hours 
required to demonstrate with confidenee, the requirement of 1380 hours MTBEEE given 
test resourees of three BCIS units for four months and an estimate of the expeeted 
number of failure that would be experieneed. Appendix B provides a summary of the 
Program Offiees approaeh in an information paper, BCIS Reliability Development 
Growth Test (RDGT) Strategies. [Ref 23] 

2) The Hunter TUAV Reliability Growth Suceess Story . The Hunter 
TUAV System has been in operation sense 1991. As a result of all the ERACAS data 
collected over the years the Hunter Reliability IPT has made some smart deeisions based 
on a Reliability Growth Management Plan that have allowed the system MTBE to “grow” 
three-fold from 3.6 to 10.9 hours, and the an 85% Ao to a 98% Ao. During system 
acceptance testing in 1995, several Hunter air vehieles were lost, due to various failures 
that resulted in a deeision to terminate the follow-on produetion program. The Army 
wanted to benefit as mueh as possible from the substantial investment made, so an “end 
to end” Eailure Mode Effeet and Criticality Analysis (EMECA) and a Eishbone Analysis 
was performed on all the critieal subsystems to identify the root causes, with resultant 
correetive aetions implemented. The Hunter system is still flying today, in support of 
training base aetivities at the National Training Center (NTC) and Joint Readiness 
Training Center (JRTC), eontingeney operations in the Balkans, and in support of TUAV 
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advanced payload demonstrations. For more complete details on The Hunter TUAV 
Reliability Growth Success Story, see Appendix C of this report. [Ref 24] 



Figure 3. Hunter TUAV Reliability Growth 


d. Tracking and Measuring Reliability Performance Progress 
A well-known saying contends; “you cannot manage what you do not 
measure.” PMs were asked to address the methodologies used to measure and track 
reliability in their programs. This is partieularly important in reliability growth programs, 
as projection methodologies not only serve to ascertain requirement compliance, but as a 
means of identifying potential problems early in the proeess. Thresholds, or intermediate 
benchmarks representing minimum reliability achievement levels should be established at 
different points along the program as risk mitigation measures. A breach of one of these 
thresholds is a signal that the program is not on track to meet reliability requirements, and 
some form of intervention to rectify the problem is required. Table 10 provides answers 
to the survey question “How do you measure and track reliability performance progress 
overtime in your program?” Respondents were asked to check all that applied. 
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Survey Responses : 


How is Reliability Performance Progress 
Measured and Tracked? 

Program 

Responses 

% of 

Programs 

By contractor proiections/analysis 

7 

39% 

Reliability growth tracking methodology 

3 

17% 

At maior reviews (PDR, CDR, TRRs, etc...) 

9 

50% 

Bytesting (e.g. RQT, RD/GT, ESS, lOT, etc... 

6 

33% 

Other (warranty, or TBD for new programs) 

2 

11% 


Table 10. Measuring and Tracking Reliability Progress 


Summary : All programs use some means to monitor reliability 

performance progress of their system during development. Albeit, the list of possible 
methods and opportunities for measuring and tracking reliability progress generated for 
this survey question are not all encompassing, the responses indicate that programs rely 
heavily on their contractors for indicators of reliability growth. 

Illustrative Examples : 

1. SGF program . For the SGF program, reliability conformance 
inspections are conducted annually on TIS and CITY throughout production 

3, Policy, Procedures and Guidance 

Objective : The DoD 5000.2-R states that the “PM shall establish RAM activities 
early in the acquisition cycle.” AR 70-1 continues by requiring an “R&M program will 
be tailored in scope and content and be designed to ensure that the user operational 
reliability requirements will be met at confidence levels established by the user.” Finally, 
DA PAM 70-3 guidance covers aspects of R&M Requirements, R&M Management, 
R&M Engineering and Design, R&M Testing, and R&M and Assessment Integrated 
Process Team (IPT) procedures. The question posed to PMOs was “Are you aware of 
any specific DoD or Army policy/regulation regarding weapon system reliability 
management? If yes, do you use it to help you manage reliability?” Answers provided in 
Table 11 help to determine the level of awareness of reliability policy, regulations and 
procedures, and whether these are sufficient to help a PM manage reliability performance 
in a program. 
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Survey Responses : 


Reliability Policy 
Awareness? 

Program 

Responses 

% 

YES 

8 

45% 

NO 

4 

22% 

NOT SURE 

6 

33% 


Table 11. Awareness of Policy, Procedures, and Guidance 

Additional responses are paraphrased below: 

• Given the acquisition reform process, it is difficult to identify which 
policies/regulations for reliability are applicable at any given time. 

• I am aware of DA PAM 750-40, Guide to Reliability Centered Maintenance 
(RCM) for Fielded Systems, however, this may not be the best guidance to give a 
contractor until the later stags of development. 

Summary : Slightly over half of those individuals responsible for 

managing reliability performance are aware of, or not sure of existing policy and 
regulations. Those that answered in the positive cited the following policies and 
regulations as ones that they still use or refer too: AR 70-1, AR 73-1, DA PAM 73-1, and 
(guidance only) MIL HDBK 781, MIL STD 1635, MIL STD 785, MIL HDBK 217, MIL 
STD 470, MIL STD 1629, MIL HDBK 189, and ISO 9001. 

F. INFLUENCING RELIABILITY REQUIREMENTS 

Purpose : The next series of answers address reliability in the context of inputs to 
the requirements generation process. The purpose of these questions was to explore 
whether a reasonable and cooperative process existed, and if requirements for reliability 
were set arbitrarily or not. A secondary line of questioning explored the relative 
importance of reliability with respect to other key performance parameters in the ORD. 
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1, Influencing Realistic Reliability Requirements 

Objective : The intent of the next question was to determine if the MATDEV is 
involved in influencing development of realistic reliability requirements into ORDs. A 
criticism of the defense acquisition process is that weapon system requirements are either 
not adequately defined or are unrealistic with respect to the state-of-the-art. The 
challenge becomes one of stating the reliability requirements in terms of operational 
mission needs and success under given conditions, with defined mission profdes and 
durations. Table 12 provides a summary of responses with respect to how PMs were able 
to influence this process for PEO lEW&S programs. Table 13 provides a summary of 
the types of reliability measures found in program ORDs. Note that some programs use 
more than one parameter to describe reliability related requirements of a system. 

Survey Responses : 


Ability to Influence 
Reliability Requirements 
in the ORD? 

Program 

Responses 

% 

YES 

14 

88% 

NO 

4 

22% 

Other 




Table 12. Influencing the Requirements Process 


Reliability 

Parameters in ORDs 

Programs 

% 

MTBSA 

6 

33% 

Ao 

5 

28% 

MTBOMF 

7 

39% 

% Prob Completing 
Mission w/out EFF 

1 

6% 

MTBEFF 

3 

17% 

MTBOMA 

1 

6% 

MTBMAF 

1 

6% 

Table 13. Reliability 

Parameters in ORDs 
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Summary : A large majority of programs do participate with the 

COMBATDEV as part of an Integrated Concept Team (ICT) to derive appropriate ORD 
requirements, including those related to reliability as part of the RAM rationale process. 
This is not universal, however, as there were 4 respondents that claimed reliability 
requirements were developed without the MATDEV’s input. A review of reliability 
requirements in various ORDs also shows that there is not a standard lexis of how 
reliability is expressed in terms of operational terminology. 

Illustrative Examples : 

1) ACS ORD . The ACS program completed Concept Exploration and 
transitioned to Component Advanced Development (CAD) in early EY02. During the 
CE phase, competing contractor teams were required to perform a sensitivity analysis on 
the aircraft range requirement and associated reliability to see what the O&S cost 
implications were due to the fact that the airframe capabilities are the largest cost driver 
in the program. They also present the best opportunity for cost savings, and so the intent 
of the PM was to have the contractors provide airframe recommendations that comply 
with all other ACS Key Performance Parameters (KPPs), but may require alternate 
wording of the KPP associated with the airframe capability. The current requirement is 
that the ACS must be capable of self-deploying 2500 NM unrefueled with any mission 
payload, and initiating operations immediately upon arriving in theater, and sustaining 
operations for a minimum of fourteen days. Possible alternate wording is that the 
airframe be self-deployable worldwide within a fixed timeframe and with increased 
reliability. 

2, Reliability As a Key Performance Parameter (KPP) 

Objective : Key Performance Parameters (KPPs) are those ORD capabilities or 

characteristics considered essential for mission accomplishment. Eailure to meet an ORD 
KPP threshold can be cause for the system selection to be reevaluated or the program to 
be reassessed or terminated. The intent of this next line of questioning is to determine the 
relative importance given to reliability performance in ORDs, and to assess where it 
stands in terms of requirements “tradespace.” Table 14 provides responses of the 18 PEO 

lEW&S systems as to where in their respective ORDs the reliability requirement ra nks . 
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Survey Responses : 


ORD RELIABILITY REQUIREMENT 

KPP 

Band "A" 
Priority 

Band "B" 
Priority 

Band "C" 
Priority 

3 

9 

1 

5 

17% 

50% 

5% 

28% 


Table 14. Reliability Requirements in the ORD 


Summary : Two-thirds of all programs surveyed have reliability prioritized in 

the ORD as either a KPP or in Band “A”. Lower priority does not necessarily mean less 
importance. It may be that the maturity of the technology is well known, such that 
reliability requirements are easily achievable, and therefore of less concern compared to 
other critical, less mature performance parameters of the system. 

G. CONTRACTING AND INCENTIVIZING FOR RELIABILITY 

Purpose : The next series of answers are in response to questions concerning how 
reliability is handled in the source selection and contracting process. 

1, Reliability Requirements in Contracts 

Objective : The first question focuses on how to address reliability requirements 
in contracts. The first survey question regarding this assessed two issues: 1) the 
significance of reliability in the source selection process; and 2) the method of translating 
operational ORD reliability requirements into quantifiable and verifiable contractual 
terms. The second question addresses whether or not specific reliability incentives are 
employed, and if so, whether the incentives are achieving their desired effect. 

Survey Responses : 


RELIABILITY AS A FACTOR 
IN SOURCE SELECTION 

YES 

NO 

50% 

50% 


Table 15. Reliability as a Factor in Source Selection 


49 



















The second part of this question asked how operational reliability requirements in 
the ORD are translated into contractual requirements. Responses are summarized in 
Table 16. 


TRANSLATION OF ORD RELIABILITY 
REQUIREMENTS TO CONTRACTUAL 
REQUIREMENTS 

ORD Requirement 
Restated in SOW 

5 

28% 

Additional Levels 

Applied to Contract 

13 

72% 


Table 16. Translation Between Operational and Contractual Requirements 

Summary : Half of those programs that participated in full and open 

competition used reliability as a factor or sub-factor in source selection. Of those that 
did, only half found reliability to be a significant discriminator in the decision process. 
Several Night Vision programs viewed reliability as a “best value” item. One program, 
the JTT/CIBS-M, deemed reliability as not a significant factor since the program relied 
on a ten year warranty and 72 hour tum-around time to meet the Ao. 

In terms of translating operational requirements to contractual ones, most 
programs add additional levels onto the reliability requirement to account for bench 
test/chamber in-house testing vice operational testing in the field and there are varied 
methodologies for doing such. Some increased the requirement by a factor of 2 to 
account for simulated operational environment is a DT test. Twenty-eight percent of 
programs, however, simply restate the ORD requirement in the contract SOW or 
Specification. 

Illustrative Examples : 

1) Translation of Opertional Requirements to Contractual Requirements . This 
example from the BCIS program illustrates one method used for deriving a contractual 
reliability requirement from an ORD reliability value. This methodology is based on 
MIL HDBK 781. An ICT consisting of HQ TRADOC, the MATDEV, and Army 
Evaluation Center (AEC) RAM personnel determined the BCIS ORD MTBOMF 
requirement to be 1242 hours based on similar equipment capabilities. Starting with the 
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1242 hours ORD value, approximately 10% is then addedd based on AEC RAM military 
field studies for electroninc equipment to get to 1380 hours. Next, a reasonable 
consumer/produeer risk level of 20% each is apportioned to get the proper statistical 
confidence levels and this provides a contractual value of 2760 hours. The stated design 
goal of 3450 hours adds a calculation factor of approximately 20% for lab versus final 
field performance histories. Finally, the value was nearly doubled to around 6500 
because two BCIS are required to complete an interrogation. BCIS achieved 3255 Hrs 
MTBOMF and thus exceeded the ORD requirement for multiple ta nk battles. [Ref. 25] 

2, Contracting Incentives for Reliability 

Objective : Providing meaningful contract incentives for achieving stated 

reliability performance is a potential method for motivating contractors. The objective 
of this question was to determine if reliability incentive methods were being employed 
and if in fact they were, did they achieve their desired effect. 


Survey Responses : 


Are Reliability 
Incentives 
Incorporated 
Within the 
Contract? 

Program 

Responses 

% 

If Yes, Did The Incentives 
Achieve Their Desired 
Effect? 

Y 

N 

Too Early 
to Tell 

Yes 

1 

6% 

— 

— 

1 

No 

17 

94% 





Table 17. Reliability Incentives in Contracts 


Summary : An extremely low percentage of contracts (only 6%) include 

reliability incentives. Of the 18 programs surveyed, only one employed reliability 
incentives in their contract. The one program that did, the Prophet program, deemed it 
too early to tell if these incentives achieved their desired effect based on field data. 
Prophet did, however, exceed its reliability requirement in operational testing. 

Illustrative Examples : 

Good reliability is critical for unmanned systems. The Shadow TUAV system 
which is currently in FRIP, plans to implement a reliability incentive contracting 
approach for its follow-on full rate production contract. The PMO is currently assessing 
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different incentive methods to motivate the contractor to continuously improve reliability 
after fielding through an incentive fee tied to achieving or exceeding failure rate goals, 
operating dollars/flight hour goals (power-by-the-fiight-hour), or attainment of full 
mission capable rate goals. The benefits of incentivizing reliability improvement include 
shared risk, increased availability, reduced inventory level, and an environment that 
encourages continuous process improvement. 

H, “DESIGNING-IN” RELIABILITY UPFRONT 

Purpose : The responses that follow provide insight into the types of tools 
techniques, and process that PMs and their contractors employ to address reliability early 
on in the development of a system. “Designing-in” reliability up front in a system 
reduces risk and is less costly, as opposed to finding design issues later on at the “back 
end” during testing and validation. The point is that you cannot guarantee reliability due 
to robust test programs, you must proactively address it in the upfront design of a system. 

Objective : The intent of this next question is to assess the types of design tools 
and methodologies employed by PMs as best practices to “design-in” reliability upfront 
in a program. Table 18 provides a summary of the survey responses. 

Survey Responses : 


Types of Design Tools Used to 
"Design-in" Reliability Upfront in 
a Program 

Program 

Responses 

% 

Physics of Failure (PoF) 

1 

6% 

Critical Items List/Analysis 

8 

44% 

Identification of Known Problem 
Areas 

14 

78% 

Software Reliability Assessment 

7 

39% 

Quality Function Deployment 

2 

11% 

Parts Control Program 

5 

28% 

FMECA/FRACAS/FTA 

8 

44% 

Reliability Prediction Analysis 

3 

17% 

Table 18. Reliability Design Techniques and Met 

lodologies 
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Summary : Emphasis should be placed early on in the use of proper design 

tools and activities to “build in” reliability up front. There are numerous reliability 
design tools/techniques that can be used to ensure reliability requirements are realized. 
Responses indicate that the primary method used by PEO lEW&S programs is 
“identification of known problem areas” and other available design tools and techniques 
are being sporadically utilized by PMOs. 

I. DEVELOPMENT AND OPERATIONAL TEST EXPERIENCES 

Purpose : Testing is the final validation of reliability performance requirements. 
The next series of answers are in response to questions concerning: 1) the adequacy of 
available time (schedule) and resources dedicated to reliability; 2) the types of testing 
conducted during development to continually assess progress and gain knowledge in 
terms of achieving reliability goals; 3) general agreement on reliability measures for test; 
4) whether “gates” are established or entrance criteria imposed on systems before 
entering an operational test; and 5) an assessment of whether success in early reliability 
testing correlates with reliability achieved in the actual operational test of a system. 

1, Resources 

Objective : PMs continually make trade-off decisions in terms of cost, schedule, 
performance, and supportability in order to achieve overall program objectives. Often, a 
PM does not have adequate time or dollars to do the necessary levels of reliability testing 
to achieve confidence in the system. To get a sense of this for PEO lEW&S programs, 
survey participants were asked whether the amount of time and funding allotted for 
reliability testing was sufficient for their programs. Responses are provided in Table 19. 


Survey Responses : 


ADEQUACY OF RELIABILITY RESOURCES 

Current Schedule 
and Available Funds 
are Sufficient 

Could Use More 
Time/$$ to Reduce 
Reliability Risk 

No Significant 
Reliability Effort at 
This Time 

3 

9 

6 

17% 

50% 

33% 


Table 19. Adequacy of reliability Resources in a Program 
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Summary : The majority response was that PMs, in general, could use more 

time and dollars if available to reduce reliability risks. Programs that responded 
otherwise were either fielded systems that had a minimal reliability program, or programs 
early on in development. It is towards the end of development and prior to formal 
testing, when time and dollars become scarce, that programs tend to adjust reliability 
efforts downward. 

Illustrative Examples : 

1) Thermal Weapon Sight . Reliability testing is unfortunately often traded off for 
cost and schedule. During the TWS Engineering and Manufacturing Development 
(EMD), the PM went directly to the OT without completing the contractor reliability test 
in order to meet cost and schedule goals. The net result was that the system achieved less 
than 10% of the reliability requirement, and the OT was changed to a EEIT, with follow- 
on OT being required. In production on this same program, the contractor chose a one- 
failure test plan for the RQT in order to meet cost and schedule. The end result was five 
RQT attempts later, the TWS finally passed. The contractor failed to adequately consider 
the risks associated with the chosen test plan, thus chose a high risk plan in an attempt to 
meet schedule and reduce test costs. 

2) Second Generation EEIR (SGE) . During the SGE EMD phase, the original 
contract had both an RDGT and an RQT were initially planned. The program ran out of 
time and dollars and had to rebaseline, and so the RQT was changed to a fixed length 
demonstration test of 2,000 hours, and separate RDGT was eliminated and basically 
combined with the fixed length test. Although the tank sights did not meet established 
reliability, the most critical sub-element, the SGE HTI B-Kit, did have very good 
performance, exceeding the requirement in OT. The overall reliability performance was 
accepted “as-is” after significant cost and schedule impacts. [Ref 26] 

2, Testing to Determine Reliability Performance Compliance 

Objective : In theory, reliability performance of a system should be continually 
assessed throughout its lifecycle. Programs sometimes fall into a common trap of 
assuming reliability is what the contactor states it to be, or reliability is treated as “final 
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exam” rather than a sequence of test events to learn from. The focus of this next question 
is to establish what types of test activities PMs use to determine reliability performance 
progress and compliance in a program. Results are summarized in Table 20. 


Survey Responses : 


Types of Test Activities PMs Use to 
Determine Reliability Performance 
Progress & Compliance 

Program 

Responses 

% 

Environmental Testing/ESS 

17 

94% 

Accelerated Testing (e.g. HALT) 

7 

39% 

Reliability Development Growth Test 
(RDGT) 

7 

39% 

Reliability Qualification/ Demonstration 
Test (RQ/DT) 

11 

61% 

Government Development Test (DT) 

8 

44% 

Operational Testing (e.g. LUT/ 
OPTEMPO/IOTE/FOTE 

16 

89% 

Acceptance Test/Production 

Verification Test 

2 

11% 

Maintenance Demonstration 

2 

11% 


Table 20. Test Activities Used to Determine Reliability Objectives 


Summary : Environmental Tests, Reliability Qualification Tests, and 

Operational Tests are the three primary venues used by PMs to determine progress and 
compliance with respect to reliability performance. Some of these tests, for example 
environmental testing, can be conducted separately, as part of a Government DT, or post¬ 
production as part of a lot-sampling acceptance test technique. 

3, Agreement on Reliability Measures for Test 

Objective : It is extremely important to have a common understanding by all 

parties (PM, User, Contractor, and Tester) on the relationship between the contractual 
reliability and the operational reliability requirements of a system. The fact is that 
reliability parameters expressed by operational Users and ones specified in contractual 
documents take on many different forms, and so there needs to be a general 
understanding of the crosswalk between the two. The CBTDEV will typically define 
reliability in terms of operational availability and mission duration needs, while the 
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MATDEV in turn takes these parameters and allocates them to technical reliabilities of 
systems and subsystems, i.e. MTBF or other similar measure. The challenge of a PM is 
to ensure the contractual reliability of the system, usually measured in controlled 
conditions, supports the very dynamic and many times unpredictable environment in 
which operational reliability is measured. Getting that right is crucial to the success of a 
program. Given the above, survey participants were asked if all parties (PM, User, 
Contractor, and Tester) were in agreement with the method (model) used to determine 
reliability performance during testing. Survey responses are provided in Table 21. 

Survey Responses : 


Have the PM, User, Contractor, and 
Tester Agreed Upon Common 
Terms for Measuring Reliability ? 

Program 

Responses 

% 

Yes 

12 

67% 

No 

4 

22% 

Not Sure 

2 

11% 

Table 21. Agreement on Reliability 

Vleasurement for Test 


Summary : Two-thirds of all programs in fact, do have agreement between all 

parties concerning the appropriate reliability measures for test. 

4, Initial Operational Test & Evaluation (lOTE) Entrance Criteria 
Objective : One approach for maximizing the chances for successfully meeting 
reliability requirements in lOTE with the requisite level of confidence (usually 80%) is to 
establish entrance criteria for a system. This can be a self-imposed risk reduction 
approach by the PM, or many times is required by the independent Tester/Evaluator to 
ensure that the system has a reasonable probability (reasonable probability defined here 
as greater than or equal to 50% of successfully passing its lOTE. Striving to meet 
reliability entrance criteria implies you are testing reliability in DT or in other test events 
and have a well laid out developmental effort, with emphasis on reliability designed “up¬ 
front” and sufficient testing programmed to mature and validate required reliability 
levels. All those surveyed were asked if their program had specific lOTE entrance 
criteria with respect to reliability. Survey responses are provided in Table 22. 
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Survey Responses : 


Does Your Program Have 
Reliability Entrance Criteria for 
lOTE ? 

Program 

Responses 

% 

Yes 

10 

56% 

No 

7 

39% 

Not Sure 

1 

5% 


Table 22. Reliability Entrance Criteria for lOTE 


Summary : A significant number, 56% of programs surveyed have or had 

reliability entrance criteria established with respect to lOTE. Some of those that did not 
indicated that reliability performance results achieved during DT and at other test events 
were briefed at Operational Test Readiness Reviews (OTRRs). 

Illustrative Examples : 

1) Shadow TUAV . The lOTE entrance criteria varied, depending on the program. 
Eor example, the Shadow TUAV system must demonstrate the ability to operate for 12- 
18-18-18-8 hours over a 5 day period in accordance with its OMS/MP. The JTT-CIBS- 
M program must demonstrate successful progress in its RDGT. Still yet, other programs 
within PM NV/RSTA have entrance criteria requirements in terms of MTBOME and 
MTBEEE with varied levels of confidence. In some programs, for example BCIS, there 
were no lOTE entrance criteria with respect to reliability due to the fact that there were 
not enough hours in lOTE to be a statistically significant event for reliability. 

5, Correlation of Early Test Results with lOTE Success 

Objective : Testing early and often for reliability, within the fiscal realities of a 
program’s budget, is key to gaining early knowledge and is used for correcting 
deficiencies in a system prior to its operational test event. Survey participants were asked 
whether prior success in reliability performance testing during DT or other events 
correlated with a success in lOTE. Responses are summarized in Table 23. 
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Survey Responses : 



Level of ORD Reliability 
Requirement 
Demonstrated 

Correlation of Early Reliability Test 
Results With lOTE Results? 

Program 

Response 

% 

Initial DT 
Results 

Initial OT 
Results 

Yes, success in pre-IOTE reliability 
testing led to requirements being fully met 
in initial lOTE. 

5 

28% 

100 % 

100 % 

8 

5 

Not completely, system did well in pre- 
IOTE testing but had some problems in 
initial lOTE 

1 

6% 

80 % 

80 % 

4 

3 

Not at first, system passed lOTE after 
attempts. 

5 

28% 

60 % 

60 % 

2 


N/A , system either not yet involved in an 
operational test or the OT did not assess 
reliability. 

7 

38% 

< 40 % 

< 40 % 


3 


Table 23. Correlation of DT Reliability Testing with OT Success 


Summary : Programs that experience success in pre-IOTE reliability testing do 

not always enjoy success in lOTE. Of the 18 systems surveyed, 7 had either not yet gone 
through their operational test, or the amount of operational test hours was not sufficient 
enough to be statistically significant to evaluate reliability. Of the remaining 11 systems, 
five did not successfully pass their lOTE on their first attempt, with problems at least 
partially attributed to reliability issues. In one program, the system had fully 
demonstrated its reliability requirement during DT and other testing, only to achieve 
around 40% of its requirement once it went to OT. 

Illustrative Examples : 

: A number of PEO lEW&S programs experienced reliability problems 

during their initial (and some subsequent) operational tests. Some examples follow. 

• The Shadow TUAV system entered into an lOT in Apr 01. After two air 
vehicles crashed early on the test was reduced to a Eimited User Test (LUT) and 
subsequently halted. System perfomance was due to a combination of factors: 
training, crew errors, and reliability problems. Prior to the lOTE, Shadow had 
fully achieved its MTBSA requirement of 20 hours in an OPTEMPO test 
conducted prior to lOTE that also demonstrated the ability to meet its OMS/MP 
of 12-18-18-18-8 hours over a 5-day period. Relibility during the shortened 
lOTE was assessed by the Project office at approximately 8 hrs MTBSA. 
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• The Common Ground Station achieved only 11 hrs MTBSA vice its 
requirement of 48 hrs in its first operational test. The system improved some 
during its second OT, and finally met its reliability requirement in the third OT. 

• The Thermal Weapon Sight (TWS) failed its lOTE in February ’00, and 
four subsequent RQTs before finally passing on its fifth attempt. 


J. THE IMPACT OF ACQUISITION STREAMLINING AND DOWNSIZING 

Purpose : With the advent of acquisition reform came a strong push towards 
achieving the most efficiency possible by “reengineering” the way we do business in the 
defense acquisition environment. Military specifications and standards were no longer 
acceptable and performance-based contracting became the best practice. During the same 
period, government downsizing occurred and doing more with less was the norm, and so 
the question must be asked, is there a downside to this at all? Perhaps not, but to get a 
sense of the pulse from those in the reliability community the question was put forth. 

Objective : The focus of this question was to get feedback and opinion from 
people who work reliability performance management within the PMOs to see if there 
has been any perceived adverse effects with respect to reliability due to the shift to 
performance specifications, increased use of COTS, and government downsizing. The 
responses in Table 24 represent the opinions of those that participated in the survey. 


Survey Responses : 


In your opinion, has the move towards performance- 
based specifications, the increased use of COTS, and/or 
the continued trend of Government downsizing had any 
negative effects on reliability of systems? 

Program 

Responses 

% of 

Programs 

Yes, due to performance based specifications. 

7 

39% 

Yes, due to downsizing the workforce. 

2 

10% 

Yes, due to both acquisition streamlining and downsizing 

7 

39% 

No 

1 

6% 

No comment 

1 

6% 

COTS/NDI components do not live up to OEM claims 

3 

17% 

Table 24. The Impacts of Acquisition Streamlining and 

Downsizing 
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Additional Survey Comments (Paraphrased) : 


• The Government is losing/has lost reliability expertise at the PMO level. 
Also, using COTS produets inereases risk in the area of reliability for weapons 
platforms in a military environment. 

• The inability to state speeifieally the reliability tools and the level of detail 
desired allows a eontraetor to minimize their relibility effort. 

• The Government has lost the majority of the expertise to manage reliability 
effeetively and aequisition reform has resulted in vague requirements that eannot 
be demonstrated by contractors. 

• There has been a complete turnaround with regards to the importance of 
DEMONSTRATING reliability requirements. Not enough time or money to 
accomplish requirements that have no backing. 

• Reliability testing is too expensive and cannot be adequately resourced. We 
(PM) do not have enough qualified personnel to be dedicated to reliability, plus, 
we do not follow up after the system is fielded to track failures. 

• A COTS approach does not necessarily equate to high reliability in a military 
environment. 

• Restrictions in the ability to specify the test method sometimes results in an 
inappropriate methodology being employed which has to later be negotiated out 
of the contract. Also, the requirement to state reliability as a probability resulted 
in a number of problems as well. 


Suggestions for Improvement from Survey Respondents (Paraphrased) : 

• Allow the government to place the hard reliability requirements back in the 
contracts language. 

• Reinstate the RAM rational process; issue binding policy for reliability; state 
reliability in terms demonstratable by contractors; require reliability program 
plans; hold PM's (as part of their rating) accountable for reliability; make 
reliability a KPP; and budget adequate funds for reliability. 

Summary : There are some strong emotions concerning this subject. A 

significant majority of respondents (89%) are of the opinion that acquisition 
streamlining and/or workforce downsizing have in some way contributed to the 
state of reliability within the Army today. Reasons given include loss of 
government expertise in the area of reliability, inappropriate use of COTS in a 
military environment, and lack of resources dedicated towards reliability. 
Approximately 40% of respondents believe the Army community must 
compensate with alternative policies, processes, and tools. 
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K. COMMERCIAL PRACTICES 


Purpose : The focus of this survey question is to see what best commercial 
practices in reliability assurance are being applied to the acquisition of military systems. 

Objective : The responses summarized in Table 25 establish the type and extent 
of commercial best practices employed by program management offices. 

Survey Responses : 


What Types of Commercial Reliability 
Assurance Practices Do You Employ in Your 
Program? 

Program 

Responses 

% 

Physics of Failure (PoF) 

1 

6% 

Predictive Models 

3 

17% 

Prognostics/Life Consumption Monitoring 

— 

0% 

Identification and Mitigation of Failure Modes 
(e.g. FMECA) 

4 

22% 

Accelerated Life Testing (e.g. FIALT) 

3 

17% 

Reliability Growth Testing 

6 

33% 

Reliability-Driven Parts Selection/Control 

5 

28% 

Other 

— 

0% 

Do Not Employ any Commercial Practices 

5 

28% 


Table 25. Use of Commercial Reliability Assurance Practices 


Summary : Most programs (72%) in general employ some type of tools or 

techniques using commercial best practices and methods to assure that high reliability 
products can be manufactured. 

L. MAINTAINING AND IMPROVING RELIABILITY IN THE FIELD 

Purpose : AR 70-1 states that “PMs are to track fielded system’s failure and repair 

histories starting at First Unit Equipped (FUE).and should focus on the identification 

of operating and support cost drivers that lead to improvements where they are cost 
effective.” The next series of questions in the survey were posed to determine the extent 
to which PMs track and manage reliability performance post-fielding, and whether a data 
collection system is in place to support focused and cost effective improvements once a 
system is fielded. 
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Objective : The responses are intended to demonstrate whether PMOs are 

adequately engaged in tracking and improving a fielded system’s reliability. Table 26 
summarizes the survey responses in six separate areas: 1) conditional materiel release; 2) 
formalized system for collecting field reliability data; 3) status of reliability performance 
in the field; 4) cost effective reliability improvements in O&S; 5) formal reliability 
improvement programs; and 6) Reliability Centered Maintenance (RCM). 


Survey Responses : 


RELIABILITY OF FIELDED SYSTEMS 


YES 

NO 

# 

% 

# 

% 

Conditional Materiel Release (CMR) 





Was the system initially fielded with a CMR due to 
reliability shortfalls? 

2 

20% 

8 

80% 

Is the CMR still in effect? 

1 

10% 

1 

— 

N/A 

— 

— 

8 

80% 

Collection of Field Reliability Data 





Reliability information is obtained from Depot, 
Contractor Logistics Support (CLS) records, or 
other means (e.g. Production Quality Deficiency 
Reports PQDRs) 

4 

40% 

6 

60% 

Warranty collection data provides information on 
reliability performance 

3 

30% 

7 

70% 

A formal collection system does not exist 

6 

60% 



Status of Reliability in the Field 





System performance meets/exceeds ORD 

4 

40% 



System performance is less than ORD 

— 

— 



Do not know (due to lack of data, or too early) 

6 

60% 



Cost Effective Reliability Improvements 





Has collection of reliability failure data in the field 
led to any cost effective improvements? 

3 

30% 

5 

50% 

Too early in program to tell 

2 

20% 



Reliability Improvement Program 





Is there a formal reliability improvement program? 

1 

10% 

9 

90% 

Reliability Centered Maintenance (RCM) 





Is there a formal RCM program? 

— 

0% 

10 

100% 


Table 26. Reliability of Fielded Systems 
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Summary : It appears as if there is a general lack of a systematic process for 

collecting reliability trend data. You can repair or warranty data on most any system 
either through contractor logistic support or the Army maintenance databases, but there is 
no process in place to actually go in, examine reliability trend data, and feed that data 
back in to the contractor for corrective action. Of the ten fielded PEO lEW&S systems 
surveyed, 60% do not have a formal reliability data collection system in place. It is not 
surprising then, that 60% also do not know if fielded system performance is meeting the 
ORD requirement. Only 30% of the systems use field data to implement cost-effective 
changes hat improve reliability. Only one program has a formal reliability improvement 
program, and none have a Reliability Centered Maintenance program. 

Illustrative Examples : 

1) SGE Program . In the SGE contract, quarterly failure review boards are held to 
examine all the field return data, and address corrective actions. 

2) Image Intensification (12) Systems . Production Quality Deficiency Reports 
(PQDRs) are a formalized system and a means for soldiers in the field to report a problem 
or issue, and give feedback on systems. One drawback is that this method generally takes 
6-8 months to close out. After the system gets sent back to vendor, it is investigated, and 
a corrective action is applied if necessary or warranted. Most soldiers in the field, 
however, do not fill out a PQDR if the 12 tube is still under warranty. Some may not 
know about the system or feel it is too much trouble to fill out. 

3) Hunter TUAV . The Hunter program implements a RAM system that includes: 

• Eailure Reporting, Analysis and Corrective Action System (ERACAS) 

The prime contractor maintain a closed loop ERACAS. The ERACAS database is 
available on-line to the Government. Eailures involving flight critical performance or 
safety impacts have priority for corrective action. The prime contractor establish and 
maintain on-line files to track the status of high priority corrective action requests derived 
from ERACAS activities. 

• RAM Data Assessment 

The prime contractor/Govemment performs assessment of the RAM data 
available in the ERACAS database. Assessment are limited to failure characterization. 
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• Failure Review Board (FRB) 

The prime eontraetor/Govemment jointly eonduet FRB meetings on regular 
intervals (with intervals established by Government and eontraetor in the IPX) to review 
failure data and to traek high priority failures through the FRACAS proeess. The FRB 
evaluate reported failures for eritieality of performance and safety impact and establish 
priority for corrective action. 

M. CHAPTER SUMMARY 

This chapter provided data gathered from surveys, interviews, and information 
collected from various PMs within the PEG lEW&S organization. These programs 
provide a fairly representative cross-section of experiences with respect to weapon 
system reliability performance management due to their diversity in ACAT levels and 
acquisition phases. The survey addressed 20 questions regarding important issues with 
respect to reliability management, with responses grouped into eight main themes for 
ease of data presentation. The responses provided good insight into the practices 
employed by each PMO on how they manage reliability performance risks in their 
programs. Eurthermore, survey responses were augmented with some examples of 
reliability program management experiences to illustrate real-world challenges and 
concerns that PMs are often confronted with in dealing with managing the reliability 
“tradespace” of weapon systems. 

The next chapter discuses the eight main reliability themes and focuses on key 
issues, barriers, and risk mitigation techniques and strategies for maximizing the inherent 
reliability performance of weapon systems. The analysis is aligned around the research 
questions in Chapter I and based on respondent’s answers presented in this chapter. 
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IV. RELIABILITY MANAGEMENT ISSUES, ANALYSIS AND 

LESSONS LEARNED 


A. INTRODUCTION 

This chapter provides an analysis of central issues that are eommon to PMs with 
respeet to achieving weapon system reliability performance, and evaluates the general 
“state of reliability” within PEO lEW&S. The analysis is based on current program data 
and survey responses provided by partieipating PMOs, and is structured around the eight 
reliability management themes deseribed in Chapter III. Eessons learned based on 
background data and information derived from survey responses is provided at the end of 
this ehapter. 


B, KEY RELIABILITY MANAGEMENT ISSUES 

Poor system reliability ean be the cause for signifieant schedule delays and 
program overruns, and also have debilitating effeets on warfighting readiness. While this 
researeh focused speeifieally on weapon systems developed by PEO lEW&S, the issues 
portrayed and the resultant findings may be generally be applied to a broader set of 
programs throughout the Army and DoD. Eor the purposes of this thesis, analysis of the 
issues related to reliability are presented in accordance with the eight reliability 
management themes as deseribed in Chapter III; 

• Management Approaeh to Reliability 

• Influencing Reliability Requirements 

• Contracting and Incentivizing for Reliability 

• “Designing-in” Reliability Upfront 

• Development and Operational Test Experienees 

• The Impact of Acquisition Streamlining and Downsizing 

• Commercial Practices 

• Maintaining and Improving Reliability in the Eield 
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C. ANALYSIS OF KEY RELIABILITY MANAGEMENT ISSUES 

1. Management Approach to Reliability 

So why do we struggle with reliability? Why do our weapon systems trend 
towards failure more often than sueeess when it comes to achieving their reliability 
requirements? The state-of-the art technology that we deal with for certain can be cited 
as one factor, but not the driving one in my opinion. Yes, it is true that the night vision 
systems that we integrate on ground combat systems have complex optics and intricate 
focal plane arrays, and air vehicle platforms relegate their own set of reliability 
challenges on our sensor systems, and our systems must operate in harsh environmental 
conditions, but is that really it? The limits of technology and the capabilities it brings to 
our systems are expanding each year, and so that must be recognized, but in the larger 
analysis, it all come down to how we mange. 

Key Reasons Why We Fail . The survey responses citing the “Top 10” Army 
reliability issues, and answers to the other survey questions for that matter, center around 
5 main causes, all of which have more to do with lack of proper managing than they do 
with technology challenges: 

• Unrealistic Requirements - There is a dialogue disconnect between the 
MATDEV and the CBTDEV on reliability requirements. 

• Poor Planning - Reliability growth is not widely utilized as a tool to 
reduce reliability related design issues early on. 

• Overall Poor Design - Reliability is not being “designed-in” upfront in 
our weapon systems. 

• Inadequate Testing - Testing is too little/too late and is typically 
shortchanged as funds and schedule become tight. 

• Lack of Qualified Personnel - Downsizing has left a gap in qualified 
reliability experts. 

The following paragraphs are some additional key noteworthy points” 

Responsibility . Who is responsible for reliability within a PMO? Two-thirds of 
all programs surveyed address issues related to reliability in either a Reliability Integrated 
Product Teams (IPT) or a Logistics IPT. This may not necessarily be a positive thing, as 
this could effectively be isolating reliability engineering to only those IPTs. Instead, 
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reliability should be the responsibility of all IPTs to get at the aetual sources of the 
problem, rather than have one IPX addressing only the symptoms. 

Planning & Documentation . Only a small percentage (17%) of programs within 
PEO lEW&S have a comprehensive document that identifies details of a Reliability 
Program Plan (RPP) for their system. Of those that do have one, none were reviewed for 
content as part of this thesis research, but a good plan should detail all of the reliability 
activities, functions, processes, test strategies, measurement/metrics, data collection, 
resources and timelines required to ensure system reliability objectives are achieved 
within the program. 

Reliability Growth . Two-thirds of all programs surveyed do not utilize reliability 
growth testing (RGT) as mechanism for continuously gaining knowledge on their system. 
The reality is that reliability performance is not always continuously assessed, or worse 
yet, the emphasis on reliability oftentimes comes too late in a program. One example is a 
program where the contractor’s engineering estimates and models were accepted as fact, 
and no formal reliability testing was ever conducted. In another, reliability was not 
assessed until the lOTE event, and the results were well below the requirement and hence 
the system failed. Still another did not initially assess reliability until very late, at their 
first RQT. All sadly had the same results, and due to lack of early testing, it cost these 
programs valuable time and money to correct the problems, perhaps even more so than 
had they invested upfront. A reliability growth approach allows a program to 
demonstrate trends towards achieving reliability objectives, and implementing corrective 
actions early on while the design is still not yet locked in. The opposite of that is true if 
you wait until lOTE or an RQT to test reliability, both of which are “fixed” configuration 
tests that are more of final exam than a useful learning event. 

Tracking Reliability Progress . If an RGT program is not employed on a program, 
other methods must be used to measure progress towards reaching the reliability 
objective. Eor PEO lEW&S, the majority of programs track reliability progress by either 
using contractor projections, test events, or major program reviews. This may be a 
prudent approach for those systems that are incorporating COTS/NDI components, but 
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for other programs that are pushing the envelope with respeet to state-of-the-art, 
reliability must be tracked and manage at a more detailed level. 

2. Influencing Reliability Requirements 

From the data gathered on the 18 systems, it appears that for the most part there is 
a healthy dialogue between the PM and the User with respect to reliability inputs into the 
requirements process. This is somewhat in conflict with results of the “Top 10” list from 
the same pool of respondents that ranked “unrealistic reliability requirements/rationale” 
as the #5 problem. The reason for this may be because that while there is an exchange on 
reliability between the two communities as well as the test community, it may be a less 
formal process than once previously practiced. Requirements are no longer developed as 
part of RAM working groups that were the comprehensive basis or rationale for the 
numbers. 

ORD Reliability Parameters . According to DoD 5000.2-R, reliability 
requirements are to address mission reliability and logistics reliability. By definition this 
implies that ORD reliability requirements should focus on measures related to completing 
a mission, and minimizing logistics demands. Only 7 of 18 (39%) programs within PEO 
lEW&S have reliability measures tied to an operational availability requirement. The 
other programs have stated reliability requirements that are primarily performance 
parameters and are not tied to mission or supportability measures such as operational 
readiness/availability, reduced logistics footprint, manpower, and spares levels for 
example. Part of the systemic problem is having the COMBATDEV define something 
that in reality is up to the MATDEV to allocate through the system engineering process. 
The COMBATDEV focus should be on defining acceptable levels of mission failure 
while leaving the technical solution and reliability thereof to the MATDEV. 

Reliability as a KPP . Reliability is not regarded in the same fashion as traditional 
performance factors in that very rarely is reliability ever identified as a Key Performance 
Parameter (KPP) of a system. Only a three (17%) of PEO lEW&S programs had 
reliability identified as a KPP in the ORD. There are several possible reasons why this 
may be so. One is that both the CBTDEV and MATDEV may feel it is too early in a 
program life cycle to designate a definitive KPP tied to reliability. Another reason may 
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be that mandating reliability as a KPP reduces a PMs precious trade space and constrains 
the PM’s flexibility. 

3, Contracting and Incentivizing for Reliability 

Translating ORD Requirements to Contractual requirements . If ORD 
requirements fall short regarding definition of reliability expectations, then the chances 
are the contract reliability requirements will be just as inadequate. There must be a clear 
“link” between operational and contractual reliability, one that allows for conclusive and 
accountable proof of results. The challenge is to crosswalk contractual reliability 
requirements (typically assessed in a static environment, contractor’s plant, controlled 
test/climate) with operational reliability requirements (measured in a dynamic 
environment, soldiers operating the system, dirty battlefield). Failure to do this will 
significantly increased risk of program failure. It is clear that in order to achieve the 
reliability required in the ORD requires a higher reliability to be specified on contract. 
This helps to account for the environment human and environmental differences between 
lab testing and soldiers operating systems in the field. However, that being said, 5 out of 
18 programs that participated in the survey simply restated the ORD requirement as the 
contract requirement. This approach could have considerable downstream 

consequences, whereby the demonstrated levels of reliability performance could fall 
significantly short of the stated ORD requirement. 

Contract Incentives . 

If we truly are concerned about reliability performance of weapon systems, it is 
not obvious or evident in our current contracts. Only 1 of 18 programs within PEO 
lEW&S is currently even considering a contracting strategy that has incentives tied to 
achieving reliability performance objectives. This is a dilemma in that we are not 
incentivizing the behavior we seek from our contractors. It may be a cultural thing, or 
perhaps we don’t know how to adequately incentive performance in this area. There 
simply does not seem to be a willingness to explicitly pay for reliability, almost as if 
reliability were assumed as a given. This is a mindset that we must overcome; otherwise 
contractors will continue to have no motivation to produce higher reliability systems. 
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4. “Designing-in” Reliability Upfront 

“Designing in” reliability implies performing upfront analyses during the design 
phase so that the inherent reliability of the system is as high as possible. Examples of this 
inelude: 1) reducing the number of overall parts in a system can improve its reliability by 
decreasing the number of moving mechanical parts; 2) incorporating redundancy; 3) 
analyzing potential failure modes and mitigating the effect of failures or incorporating 
graceful degradation features; 4) doing a part stress analysis, and 5) making sure all of the 
chosen parts are de-rated properly. These are only but a few examples. As with anything 
else, theoretically, being proactive with reliability early in the lifecycle of a system is 
more cost effective than dealing with potential schedule delays and unexpected costs of 
failing a test later, only to have to redesign, and test yet again until the problem is fixed. 
All programs in the survey utilize some form of design tool or techniques to optimize 
reliability early on in a program. The “goodness” of these tools and technique was not 
evaluated, however. 

5, Development and Operational Test Experiences 

Reliability does not always have the emphasis, resources, or attention it requires 
to ensure mission success in a program. As evidenced by one night vision program, PMs 
are often forced to tradeoff reliability when their program is squeezed for schedule or is 
tight on funds. The “saved funds or schedule” are usually “bought” back later when 
problems arise in the system. 

Entrance Criteria . Over half of the programs surveyed had lOT&E entrance 
criteria tied to demonstration of specific reliability performance. Whether this is a 
mandate from higher leadership, or self-imposed by the PM, it is a good practice to abide 
by. To be relevant, demonstration of reliability performance should adequately duplicate 
the Operational Mode Summary/Mission Profile that is expected in the operational test. 
ATEC statistics find that 61% of programs that successfully demonstrate their reliability 
requirement prior to operational test enjoy a 65% success rate (meeting reliability 
requirements) during the actual OT. Conversely, those system failing to achieve 
reliability requirements prior in DT have an 82% failure rate in OT. The bottom line is, 
demonstration of reliability requirements in DT or other early test events can enhance the 
chances for success in OT. An analysis of PEO lEW&S systems does not support this 
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hypothesis one way other. Twenty-eight pereent of programs that had reliability 
performance successes in DT prior to lOTE went on to pass the test event, and 28% 
failed their lOTE event after successful DT testing. 

6. The Impact of Acquisition Streamlining and Downsizing 

Although subject to much debate for sure, and granted responses to this line of 
questioning are opinion rather than fact, an overwhelming majority felt that acquisition 
streamlining, workforce downsizing, and use of COTS all had some level of influence on 
reliability. This is purely a qualitative rather than quantitative assessment, based on the 
personal views of those surveyed. Eight-eight percent of all survey responses were of the 
opinion that acquisition streamlining and downsizing had some negative effect. 
Examples cited include: 

• Eoss of reliability technical expertise as a consequence of both natural 
attrition and government imposed reductions. 

• Eack of definitive contract requirements for use of reliability tools 

• Reliability performance gets “lost” in trade space. 

• Concerns with how to define enforceable performance-based 
reliability requirements. 

• Reliability testing has been marginalized due to cost constraints and 
personnel cuts. 

Rather than blame acquisition reform and changes in how we do business today 
for current reliability shortfalls, it is more appropriate to recognize the need for increased 
training, alternative policies, new processes and tools. 

7, Commercial Practices 

Best commercial practices in reliability include physics of failure, predictive 
technologies, prognostics/life consumption monitoring, identification and mitigation of 
failure modes/mechanisms (EMECA), accelerated life testing, growth testing, and 
selection of reliable parts to name a few. None of these commercial practices appear to 
be utilized to any great extent in the 18 programs surveyed. Greater use of these tools 
will reduce the risk of test failure, decrease the need for retest, and minimize corrective 
actions. 
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8, Maintaining and Improving Reliability in the Field 

The Army measures reliability after fielding by using different terms for 
availability rates; 1) operational availability, and 2) fully mission capable. Operational 
availability (Ao) has been previously defined as the probability that an item is in an 
operable and committable state at the start of a mission when the mission is called for at a 
random point in time. A system is fully mission capable (FMC) when it can perform all 
of its combat missions without endangering the lives of crew or operators. The terms 
ready, available, and full mission capable are often used to refer to the same status; 
equipment is on hand and able to perform its combat missions. FMC percent is total 
available days divided by possible days and multiplied by 100. The problem with this 
measure is that you can have an artificially high mission capable rate with excessive 
sparing, at the sacrifice of a larger logistics footprint. What really is needed to get a true 
indication of reliability performance in the field is a combination of FMC with MTBM or 
mission reliability with logistics reliability. 

As discussed in the previous chapter, and reinforced by the above, there appears 
to a disparity of how to measure and collect reliability information from the field. Most 
PEO lEW&S programs do not have a formal system in place, and rely on sporadic 
feedback from the field, CES records or PQDR information. The problem is that this 
data is not reviewed or tracked adequately for reliability trends. Another concern is that 
there does not appear to be any significant formal reliability improvement initiatives in 
place. 

D. LESSONS LEARNED 

Based on the survey responses and reliability information provided by PEO 
lEW&S systems, lessons learned can be extracted. 

1, Understand the Requirement 

A clear understanding ORD reliability performance measures as they relate to 
mission performance and system readiness is required in order to be successful in a 
program. This then needs to be translated into contractual reliability requirements that 
are measurable, enforceable, and traceable back to the operational requirement. 
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2. “Design in” Reliability 

Achieving reliable, available and maintainable systems requires a disciplined 
systems engineering approach that starts early on in a program. You cannot “test in” 
reliability no matter how hard you try. Do not just leave reliability up to the engineering 
or the logistics disciplines, everyone must be involved in the proeess. 

3. Test Early, Test Often 

Managing reliability growth requires continuous testing at planned intervals to 
gain knowledge and mature the system to ensure successful achievement of reliability 
performanee objectives. 

4. Check the Underlying Process 

Reliability issues are not always strietly due to the uniqueness of the program, or 
technology, or management issues. Check the underlying manufaeturing design proeess 
of the contractor their vendors to ensure that they are measurable and repeatable. 

5. Prove It 

Predicted reliability performanee tends to be overstated. Apply a null hypothesis 
to these reliability claims, i.e. that they are untrue until proven otherwise in the form of a 
valid test results with eonfidence in the numbers. This does not mean test for testing 
sake, beeause that can bankrupt your program. Use available data if it is applieable to 
your system. Always have a contraetor prove his/her reliability elaims. 

6. Maintain a Balance 

High reliability must be balanced with aehieving other programs objectives in 
terms of cost, schedule and performanee. Too much reliability ean cost just as much as 
too little reliability. The challenge is to maintain a balaneed perspective when 
performing tradeoffs. 

7. Follow Up 

Reliability focus does not end with fielding. Feedbaek from the field eoneerning 
reliability performance is not an automatic thing. Think ahead and plan for how you are 
gong to collect failure data to identify reliability trends and areas for improvement. 
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E, CHAPTER SUMMARY 

This chapter analyzed eurrent PM praetiees, issues and ehallenges in managing 
the reliability performanee of weapon systems based on program data and results of a 
reliability performanee survey. The analysis was struetured around eight reliability 
management themes and attempted to pinpoint either best praetiees to implement in a 
program, or eommon pitfalls that PM should avoid. Lessons learned were then provided 
based on these experienees and the baekground data gathered as part of this researeh. 
The final ehapter will make some reeommendations on how to best approaeh reliability 
performanee from a management perspeetive. 
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V. CONCLUSIONS AND RECOMMENDATIONS 


A. INTRODUCTION 

Research conducted in support of this thesis evaluated the present process for 
managing weapon system reliability performance and identified some of the common 
challenges, pitfalls, and lessons learned encountered by Program Managers today. The 
issues and challenges were derived from surveys, interviews, and information provided 
by PMs within the PEO lEW&S organization. 

In this closing chapter, conclusions with respect to the management of weapon 
system reliability performance are identified as a result of feedback and analysis of 
survey responses. In addition, the author makes several recommendations with respect to 
practices and strategies that PMs can employ to maximize the inherent reliability 
performance of weapon systems. Next, brief answers to the primary and secondary 
research questions are provided. Einally, this thesis concludes by providing 
recommended areas for further study. 


B, CONCLUSIONS AND RECOMMENDATIONS 

Analysis of survey results, interviews and program data provided by PMO 
personnel involved in reliability have led the researcher to the following conclusions and 
rec ommendations: 

1, Reliability Program Plan 

Conclusion : Programs in general, do not have a structured reliability 
management process or a corresponding overarching document that defines the activities, 
schedules, test strategies, and resources required to provide effect management insight 
into achieving overall reliability objectives of a program. 

Recommendation : Require all PMs to develop a Reliability Program Plan 

(RPP) that explicitly defines reliability management responsibilities within the 

organization; related tasks, activities, and processes; test and verification methods; 

schedule and resources necessary to achieve reliability system maturation. This should 

be considered a mandatory document for all Milestone Decision reviews, similar to what 
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a Test and Evaluation Master Plan (TEMP) provides in identifying the overall program 
test strategies and resourees, or a Command, Control, Communication, Computers, and 
Intelligence Support Plan (C4ISP) that details a roadmap for achieving interoperability 
certification of a system. 

2, Continuous Reliability Assessment 

Conclusion : Programs do not utilize reliability growth techniques and often test 
too little, to late with respect to reliability performance. 

Recommendation : DoD should re-evaluate the requirement to achieve certain 
technology readiness levels (TRL) in programs by certain milestones and incorporate 
additional criteria linked to reliability maturity levels. 

3, Requirements Clarity 

Conclusion : The current process for establishing operational reliability 
performance measures in requirements documents is inconsistent, and does not always 
link reliability performance to mission or supportability measures as required by DoD 
5000.2-R. This can lead to confusion between the MATEV and CBTDEV and result in 
failure to achieve overall desired readiness levels. 

Recommendation : Establish standards for defining reliability measures in 
ORDs, and reinstate the RAM rationale process to ensure that MATDEVs and CBTDEVs 
are jointly defining realistic achievable reliability requirements. Establish a mechanism 
that requires traceability of contractual reliability performance requirements to 
operational reliability requirements. 

4, Reliability as a Key Performance Parameter (KPP) 

Conclusion : Reliability often times gets shortchanged and is traded off to meet 
cost, schedule and performance objectives. 

Recommendation : Consider making reliability a KPP for certain programs 
where appropriate. 
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5. 


Incentives 


Conclusion : Programs do not adequately ineentivize eontraetors to meet or 
exeeed eontraet reliability requirements. 

Recommendation : Develop standard eontraet language that truly ineentivizes a 
reliability maturation proeess throughout a system’s lifeeyele. Incentives could be tied to 
a series of reliability growth demonstrations as the design matures, i.e. beginning with 
reliability predictions, then RDGT, achieving reliability entrance criteria to lOTE, 
demonstrating success at RQT, and through field metrics such as sparing levels or 
warranty returns. After implementing this language, identify a pilot program to 
participate and apply this to 

6, “Design in” Reliability 

Conclusion : Programs do not adequately take advantage of commercial tools 
and techniques for “designing in” reliability upfront in a program. Done properly, this is 
where significant downstream program savings can be achieved. 

Recommendation : DoD should consider partnering with commercial firms that 
develop and employ these tools. 

7, Reliability Entrance Criteria 

Conclusion : Programs often fail to achieve reliability objectives during 
operational testing due to inadequate upfront reliability testing. 

Recommendation : Establish a standard lOTE reliability entrance criteria 
methodology for programs and make it part of the Operational Test Readiness Review 
(OTRR) process. 

8, Reliability of Fielded Systems 

Conclusion : Most programs do not have a formal process for collecting 
reliability trend information from the field. 

Recommendation : DoD should fund and establish a standardized system for 
accomplishing this. 
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C. ANSWERS TO RESEARCH QUESTIONS 

Primary Research Question : What essential steps can a Program Manager 
take to better manage weapon system reliability requirements over a program’s life 
cycle, and how can reliability performance be maintained and/or improved once the 
system is fielded? 


The following are reeommended steps a PM should eonsider taking in an effort to 
better manage weapon system reliability performanee: 

• Make sure there is a common understanding between the Program Office and 
the User on what the reliability requirement means in the ORD. 

• “Design in” reliability early on in a program. Do not hope to “test in” 
reliability later. It simply does not work that way. 

• Plan for incremental testing to control reliability growth and gain knowledge 
for incorporation into the system design as it matures 

• Do not shortchange reliability testing for the sake of cost or schedule. It will 
bite you back later. 

• Make sure what you contract for in terms of reliability performance 
adequately supports the operational reliability performance requirement of the 
system. 

• Have a solid plan. 


Subsidiary Research Questions : The following subsidiary questions focused 
the author’s efforts in answering in answering the primary research question. 

1. What are the predominant underlying factors that contribute to 
reliability performance in Army systems, and how can a Program 
Manager (PM) mitigate risk in these areas? 


There are 5 main causes, which contribute to poor reliability performance: 

• Unrealistic Requirements - There is a dialogue disconnect between the 
MATDEV and the CBTDEV on reliability requirements. 

• Poor Planning - Reliability growth is not widely utilized as a tool to reduce 
reliability related design issues early on. 

• Overall Poor Design - Reliability is not being “designed-in” upfront in our 
weapon systems. 
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• Inadequate Testing - Testing is too little/too late and is typically shortchanged 
as funds and schedule become tight. 

• Lack of Qualified Personnel - Downsizing has left a gap in qualified 
reliability experts. 


2, What are the current policies and regulations that govern reliahility 
of weapon systems, and do they provide PMs with adequate guidance? 

The current policies and regulations that govern reliability of weapon systems 
include DoD 5000.2-R, AR 70-1, and DA Pamphlet 70-3. They all do a fairly good job 
with respect to addressing policy and procedural guidance on reliability and 
maintainability (R&M) requirements with regard to weapon systems. 


3. How does the Army address reliahility performance of a weapon 
system in the requirements generation process, and to what extent can 
a PM influence this process? 

Reliability requirements are developed by the CBTDEV in conjunction wit the 
MATDEV as part of an Integrated Concept Team Process that the PM participates in. 
Three key elements combine to define overall reliability performance requirements: 1) 
operational and logistics reliability parameters; 2) the OMS/MP of the system; and 3) 
failure definition and scoring criteria . A PM can influence this process by participating 
in the IPT and providing reliability realism in terms of what the current state-of-the-art is. 


4, How is reliability addressed in the system engineering process, and 
what technology, tools and techniques are available to ensure 
reliability of a system is "designed in" upfront? 


The starting point for designing in reliability is the systems engineering process 
beginning with requirements definition and analyses, and the conduct of cost/benefit 
trade-off analyses to determine alternative requirements, allocations, and design 
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solutions. Examples of the types of teehnology, tools and teehniques that are available to 
inelude the following: 

• Physics of Failure (PoF). 

• Critical Items Fist/Analysis. 

• Identification of Potential Reliability Problems. . 

• Software Reliability Assessment. . 

• Redundancy. 

• Variability Production Processes & Quality Assurance. . 

• Parts Control Program. 

• Allocation and Prediction. 

• FMECA, FRACAS, and FTA. 

5. How has acquisition reform and the shift to performance based 
contracting impacted the reliability of weapon systems? 

Although there is mixed opinion on their effects on reliability, acquisition reform 
and performance based contracting have allowed the contractor the flexibility to 
determine exactly how the reliability requirements will be achieved. They do not give 
the contractor relief from the requirement, it still must be met. 


6, To what extent does commercial industry differ in their approach 
towards product reliability, and can the Army leverage these best 
practices to improve performance in military systems? 

There are numerous differences between the needs of the military customer and 
those of the commercial customer, the reliability needs of the military focus primarily on 
operational readiness (product performance on demand), operational longevity (long 
useful life vs. short life cycles), operational supportability (repair/replace vs. throwaway 
items), and operational robustness (satisfactory performance over environmental 
extremes. Industry typically tests for reliability early and continuously throughout the 
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development cycle of a product. The Army, on the other hand tends to treat reliability 
like a final exam, and should embrace the commercial industry philosophy. 

7. How is system reliability addressed as part of the test program, and 
what program strategies can a PM employ to ensure that a system will 
successfully pass reliability testing with a high level of confidence? 

Reliability should, in practice, be tested throughout the test program of a system. 
The key is to test early and test often. Various contractor and government tests can be 
used to demonstrate compliance to contractual and operational reliability requirements: 

• Environmental Testing 

• Accelerated Testing 

• Reliability Development/Growth Testing (RD/GT) 

• Reliability Qualification/Demonstration Testing (RQ/DT) 

• Government Developmental Testing (DT) 

• Operational Testing (OT) 

• Early User Test(EUT)/Limited User Test(EUT) 

• Initial Operational Test (lOT) 

• Eollow-On Test (EOT) 

8, How do PMs plan to manage and track reliability, and what metrics 
are useful for measuring reliability performance during various stages 
of system development? 

PM manage and track reliability via contractor testing, reliability growth tracking 
methodology, trough major design review, testing, and by collecting field data. Typical 
reliability measures include MTBSA, MTBOMF, MTBEEE, MTBOMA, MTBMAF, and 
Ao. 
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9, How does a PM contract and incentivize for reliability with industry, 
and are there potential areas for improvement? 

This is not a widely used practice, and is one that needs to be pursued further. 
Incentives could be tied demonstrations of reliability growth, and successful achievement 
of a systems reliability requirement during a major test. 


10. Once a system is fielded, how does a program office ensure reliability 
performance is maintained, and what further can be done to improve 
reliability performance of fielded systems? 


PMs can ensure reliability of a system is maintained by setting up a system that 
collects and tracks reliability failures in the field. One technique for improving reliability 
performance in the field is to identify cost effective reliability improvements and 
incorporate through system upgrades. 


D, RECOMMENDATIONS FOR FURTHER STUDY 

The following are recommended topics for additional research: 

• Evaluate how and what kind of reliability data is currently collected in the 
field, and determine how to best optimize the process so that there is proper 
feedback for reliability improvement. 

• Analyze the best methods and approaches for incentivizing reliability in 
contracts 

• Compare and contrast the commercial model for achieving highly reliable 
systems with that of the DoD. Assess how this can be best adapted for weapon 
system development. 
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E, THESIS SUMMARY 

Managing weapon system reliability performanee demands constant attention and 
implementation of effective management strategies that balance cost, schedule and 
performance against reliability risks over the course of a weapon system’s development 
and fielding. The key to it all resides in early identification of upfront cost-effective 
opportunities for improving reliability performance, and mitigation of associated risks 
during design, manufacturing development, test, and post-production. Predictability in 
the field is the desired end state. 

Reliable weapon systems are a critical element to fighting and winning wars. To 
put this all in perspective, at the U.S. Army Forces Command (FORSCOM), where 
warfighting readiness is the number one priority and their soldiers are “on point for the 
nation”, their primary mission is to train, mobilize, and deploy ready ground forces in 
support of the National Military Strategy. FORSCOM has openly stated that in order to 
support their number one requirement of readiness, they require predictable weapon 
systems. That need for predictability equates to a requirement for reliable systems. 
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APPENDIX A: WEAPON SYSTEM RELIABILITY 
PERFORMANCE SURVEY 


Directions: This survey is being conducted to support research as part of a Naval 
Postgraduate School Thesis on challenges in managing weapon system reliability performance. The 
results of this thesis are intended to directly benefit any PM that is, or will be managing complex 
programs, by identifying common reliability management issues and potential pitfalls, why they occur, 
risk mitigation techniques, lessons learned, and suggestions for improved methods for managing and 
reducing the inherent risks associated with achieving stated reliability performance requirements. 

The research is limited to a cross-section of systems in various stages of the acquisition 
process that are managed within the Program Executive Office for Intelligence, Electronic Warfare & 
Sensor (PEO lEW&S). The analysis is limited to an assessment of reliability management and 
process issues, and does not specifically address commodity or technology driven reliability problems. 

Please answer the following questions and email them back NET 30 Nov 2001. A separate 
survey is required to be filled out for each participating program. 

** Results will be represented in aggregate form, not program specific ** 

Project/Program Management Officeiselect here (click on dropdown list) 

Program/System Nameiselect here (dick on dropdown list) 

Current Life Cycle Phase: 

I I MS A {specify CE or CAD ) 

I I MS B {specify SI or SDD ) 

□ MS C {specify TRIP or FRP ) 

I I Operations & Support {how long has it been in the field? years) 

I I Other or N/A {MSphase as defined under the old 5000 model ) 

Required Reliability/Availability: (specify reliability requirement/measure in terms of 
MTBF, MTBCMF, MTBOMF, MTBMA, Aq, etc...) 

I I ORD (state value e.g. 300 hrs MTBF, 95% Aq) 

I I Contract (state value) 

I I Other (state value) 

Measured Reliability/Availability: (quantify measured reliability results consistent with 
measures/units from above, e.g. 300 hrs MTBF, 95% Aq) 


□ dt 

results: 

Passed? YD ND 

□ RQT/RDGT 

results: 

Passed? YD ND 

□ OT 

results: 

Passed? YD ND 

1 1 Field Data 

results: 

{how collected: ) 

1 1 Contractor 

claims; 


O Other 


results: Passed? YO NI I 


{state type of test: ) 


Has the system experienced any major reliability test failures? (i.e. failed DT or lOTE 
reliability performance requirements) YesO Nol I 

Explain: 
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Survey Questions: (please answer all questions. If a question does not apply to your 
program due to its current acquisition phase, please answer based on experiences encountered in 
prior phases. Check all boxes that apply. I have left room after each question for additional 
commentary if you find it necessary) 

1. How is the system reliability program and eorresponding management approaeh to 
sueh formally doeumented within your program? (check only the primary overriding document) 

I I Reliability Program Plan Q Contract SOW Q TEMP Q SAMP 
I I No formal reliability management plan Q Other {explain: ) 

Additional comments: 

2. Who within your organization is primarily responsible for reliability aetivities for this 
partieular program? (check only one) 

□ PM 

I I Project Leader 

I I Systems Engineering Team Lead 
I I Logistics/Supportability Team Lead 
I I Test Team Lead 

I I Reliability IPT (formally chartered IPT? Y Q NI It 
I I Prime Contractor 
I I No one specifically 
O Other (please explain ) 

Additional comments: 

3. What eontraetual design tools were/are employed to ensure reliability is “built in” 
early on in the program? (check all that apply) 

I I Physics of Failure (POF) techniques 
I I Critical Items List/Analysis (i.e. complex, state-of-the-art 
technology, high cost, single source, or single failure point component) 

I I Identification of potential reliability problems (i.e. known 
reliability problem areas) 

I I Software Reliability Assessment 
I I Quality Function Deployment (explain-. ) 

I I Parts Control Program 

□ FMECA, FRACAS, Fault Tree Analysis 

O Other (describe-. ) 

Additional comments: 

4. Identify the types of test aetivities that have or will be used to determine eomplianee 
as part of your system’s reliability program, (check all that apply) 

I I Environmental Testing 
O Accelerated Testing (e.g. HALT) 

I I Reliability Development Growth Test (RDGT) 

I I Reliability Qualification/Demonstration Test (RQT or RDT) 

I I Government Developmental Testing 

I I Operational Testing (type, i.e. LUT/OPTEMPO/IOT/FOT ) 

I I Other (describe: ) 

Additional comments: 
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5. Is the amount of time and funding allotted for reliability testing during DT suffieient 
for your program? {for systems beyond DT, answer in terms of how your program was 
postured going into DT at the time) 

I I Current schedule and available funds are sufficient (low risk now) 

I I Could use more time/$$ to reduce risk (medium/high risk now) 

I I No comment 
Additional comments: 


6. Does your program ineorporate a reliability growth program? 

I I Yes (where is this detailed? ) 

□ No 

□ N/A (check this only if system is already fielded and there are no 
current plans for improving the inherent system reliability) 

Additional comments: 

7. If your system has already partieipated in an lOTE, did your sueeess in either DT or 
RD/GT (or other reliability testing) eorrelate with sueeess in lOTE? (check all that apply) 

I I Yes, success in pre-IOTE reliability testing led to reliability requirements being fully met 

in lOTE 

□ Not completely, system did well in pre-IOTE reliability testing, but had some new 
problems during lOTE that needed correcting 

I I Not at first, system passed lOTE after # attempts (dick on dropdown list) 

I I N/A, system has not yet been involved in an operational test 
Additional comments: 

a. To what level was your system’s ORD reliability requirement demonstrated (state 
in terms of % of ORD requirement met) 


During DT? 

During OT? 

□ 100% 

□ 100% 

□ >80% 

□ >80% 

□ >60% 

□ >60% 

□ >40% 

□ >40% 

□ >20% 

□ >20% 

□ <20% 

□ <20% 

8. Does (or 

did) your program have sj: 


reliability? 


□ Yes {provide details: ) 

□ No 

Additional comments: 


9. Have the User, Tester, Contraetor, and PMO all agreed upon the method (model) to 
be used in reliability ealeulations? 

□ Yes {where is this documented, e.g. contract, TEMP, SEP?? ) 

□ No 

I I Not sure 
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Additional comments: 


10. Is Reliability identified as a Key Performanee Parameter (KPP) in the system 
Operational Reqnirements Doenment? 

O Yes 

□ No 

a. If not a KPP, for systems still in development, where is reliability ranked in terms of 
reqnirements “tradespaee”? 

I I Highest tier priority/Band A 
I I Middle tier priority/Band B 
I I Lower tier/Band C or below 
Additional comments: 

11. Were you as the MATDEV able to influenee ineorporation of realistie reliability 
requirements as part of the ORD proeess?? 

I I No, requirements were developed independently by COMBATDEV 
I I Yes, input was provided as part of ICT or RAM rationale process 
I I Other (explain: ) 

Additional comments: 

12. Was reliability ineluded as a faetor in the souree seleetion proeess? 

I I Yes {provide details ) Was it a significant discriminator? YO NI I 

□ No 

Additional comments: 

a. How are ORD reliability requirements for your program translated into aetual 
eontraetual reliability requirements? {base on last contract awarded) 

I I ORD paragraphs relative to reliability are restated in SOW/Spec (i.e. contract requirement 
is equal to ORD requirement) 

I I Additional levels of reliability are applied to the contract 
{briefly describe process) 

I I Comprehensive reliability requirements are not adequately stated in the contract 

□ Other {explain: ) 

13. Are there ineentives employed in the eontraet that are speeifieally tied to aehieving 
system reliability performanee requirements? 

□ Yes {describe'. ) 

□ No 

a. If yes, did these ineentives aehieve their desired effeet? 

□ Yes 

□ No 

□ Too early to tell 
Additional comments: 
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14. Are you aware of any speeifie DoD or Army poliey/regulation regarding weapon 
system reliability management? 

□ Yes {if yes, which do you use to help you manage reliability? ) 

□ No 

I I Not sure 
Additional comments: 

15. What risk mitigation teehniques does your program employ that address system 
reliability performanee issues? 

Briefly describe: 

Additional comments: 


16. How do you measure and traek reliability performanee progress over time in your 
program? {check all that apply) 

I I By eontraetor projeetions/analysis 
I I Reliability growth traeking methodology 
I I At major reviews (PDR, CDR, TRRs, ete...) 

I I Other {please specify: ) 

Additional eomments: 

17. In your opinion, has aequisition streamlining (e.g. performanee speeifieations, use of 
COTS, ete...) and/or the eontinued trend of government downsizing eontributed either direetly 
or indireetly towards reliability shortfalls experieneed by programs in general? 

I I Yes, acquisition streamlining {provide details: 

I I Yes, government downsizing {provide details: 

□ Yes, both {provide details: 

□ No 

I I No comment 

a. If COTS/NDI components were/are utilized in the design of your system, did the 
COTS components realize the reliability performance claims of the OEM? 

□ Met 

I I Exceeded 

I I Less {provide details, e.g. problems with integration, use in military 
environment, improper claims, etc... : ) 

I I N/A {no COTS/NDI in system design) 

Additional comments: 

b. Given the realities of streamlining and downsizing, do you believe the Army 
reliability community has adequately compensated with alternative policies, processes and 
tools? 

□ Yes 

□ No 

I I No comment 

c. Do you have any suggestions for improvement? {explain: ) 

Additional comments: 
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18. For “fielded” systems only, please answer the following: 


a. Was or is your program fielded in a “eonditional materiel release” status due in 
part from failure to meet ORD RAM requirements? 

O Yes {is CMR still in effect? Yes O Nol If 

□ No 

Additional comments: 

b. How is eolleetion of reliability field data performed to gather failure and repair 

histories? 

I I Depot or CLS Maintenance records 
I I Warranty data gives us this information 
I I Reliability data not formally collected 

□ Other {explain: ) 

e. Does eurrent field reliability data indieate your system still meets or exeeeds the 
ORD reliability requirement? 

□ Yes 

□ No 

I I Reliability data not formally collected 
Additional comments: 

d. Has any of the reliability failure data eolleeted led to identifieation of 0»&S eost 
drivers that subsequently led to eost effeetive improvements? 

□ Yes {if significant improvements, please expand upon: ) 

□ No 

Additional comments: 

e. Is there a formal reliability improvement program for your system? 

□ Yes {if yes, where documented? ) 

□ No 

Additional comments: 

f. Does your system employ a Reliability Centered Maintenanee program? 

□ Yes {ifyes, how is it formally implemented? ) 

□ No 

Additional comments: 

19. Does your program employ or leverage any eommereial best praetiees in terms of 
reliability performanee management? (e.g. physics of failure, predictive technologies, 
prognostics/life consumption monitoring, identification and mitigation of failure modes/mechanisms 
(FMECA), accelerated life testing, growth testing, selection of reliable parts) 

□ Yes {identify: ) 

□ No 

Additional comments: 
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20. Rank order the following Army Top 10 reliability management problems; 

(click on dropdown list for each) 

Reliability is not a KPP 

Contractor not designing for reliability sufficiently above requirement 
Contractors not using best commercial practices 
Not aggressively “designing-in” reliability upfront 
Poor reliability planning and growth planning (test too late) 

Inadequate policies and guidance (need updating) 

Insufficient reliability testing to verify requiremenfs 
Unrealistic reliabilify requiremenfs wifh inadequafe rationale 
Need more qualified personnel in reliabilify management 
Not consistently improving reliability after fielding 
Other (fill in your own: ) 

Additional comments: 

Please provide any other comments, observations, or lessons learned that you would like to 
share here (use additional sheet if necessary: 


Thank you for your time and support in filling out this survey. 
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APPENDIX B: BCIS RELIABILITY DEVELOPMENT GROWTH 

TEST (RDGT) STRATEGIES 


1. Purpose: The purpose of this paper is to provide a preliminary strategy for 
development of a eourse of action for the BCIS RDGT to be conducted as part of 
PVT II for subject system. The paper provides some basic concepts relative to 
growth methodology, the parameters which define a growth curve (and thus growth 
test strategy); the sensitivities of those parameters to test length and the risk 
associated with various strategies. A feasible strategy is then fashioned within the 
resource constraints of budget and time given technical feasibility. 

2. Growth Parameters: The formula for an idealized growth curve is given by: 

(1) Mf=[Mi/(l-a)][(T/Ti)“] 

Were: Mf = Final MTBF value we wish to grow to. (see note) 

Mi = Initial MTBF of the system starting test. 

T = Total number of test hours. 

Ti = Time to first failure; i.e., when our first fix will be implemented, 
a = Growth Rate 

Note: The Mf value represented by this formula is to achieve your reliability 
requirement at the point estimate level. It is desirable and standard practice to meet 
requirements with 80 percent confidence, which would then be the desired value to grow 
to; i.e., in order to demonstrate confidence, one must grow to a MTBF value higher than 
the requirement level. This formula does not allow for computation of final MTBF 
values at confidence levels and is given here for illustrative purposes. Computations on 
final MTBF values (i.e., meeting requirement with confidence) were done using AMSAA 
generated software routines. 

3. Calculation of BCIS Reliability Growth Parameters: In order to determine test 
requirements for the BCIS RDGT, estimates for a number of growth parameters had 
to be constructed based on a number of factors: assumptions, historically feasible 
growth rates, current estimate of BCIS reliability, and limitations constrained on 
testing such as number of units under test, number of fixes to be implemented, and 
calendar time available for testing. Test duration is sensitive to the a growth rate, the 
requirement we wish to demonstrate with confidence, the starting or initial MTBF 
(Mi) and the expected elapsed time before we see our first failure and put in our first 
fix (Ti). Based on the calendar time available for test (4 Months) and units under test 
available (3), the maximum time available was computed at 8,640 hours. However 
some of that time must be allotted for implementation of fixes or corrective actions 
for failures found during the test, so that the reliability of the system may be matured. 
A growth rate of a = 0.45 was assumed (based on historical data); this is an 
acceptable but high risk level for growth achievement, anything less would violate the 
calendar time constraints. A starting or initial MTBF (Mj) of 560 hours MTBEFF 
was assumed using reliability projection methodology and based on a Fix 
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Effectiveness Factor (FEE) of 95% on corrective actions to failure modes occurring 
during PVT several years ago (this area will be elaborated on subsequently). Ti is 
estimated (heuristically) at 2.5 to 3.0 times Mi (the foundation is: find a time interval, 
Ti such that we have a .90 to .95 probability of at least one failure occurring given an 
initial MTBF of Mi using a Poisson Distribution with mean X=( Ti/ Mi). Figure 1 
represents the idealized growth curve based on the above parameter construct. 


BCIS RDGT Requirements 


BCIS PLANNED RDGT DEMONSTRATES 
REQUIREMENT @ 70% CONFIDENCE 



Test Hours 


ASSUMPTIONS 
O Starting MTBEFF of 
560 Hrs 

- Assumes 95% FEF 
of previously identified 
failure modes. 

O Growth Rate (alpha) 
of 0.45 



X 




_ £ 

Test Program of 7468 Hrs Allows for Demo of 
Requirement With 70% Confidence 


Figure 1. BCIS Technically Feasible RDGT Strategy 

Figure 1 represents the idealized curve of a technically feasible RDGT strategy 
constrained by calendar time and test resources. This strategy calls for 7468 hours of 
testing (combined on three available units). The expected number of fixes was calculated 
as six. Given max available time and actual test hours, provides time for fix 
implementation. Given the achievement of a 0.45 growth rate and initial starting 
MTBEFF, the system can grow to a value of 2163 hours MTBEFF, thus meeting the 
contractual specification of 1380 hours MTBEFF with 70% confidence. Given the 
constraints mentioned, this is the maximum confidence allowable by this test (more on 
confidence later). 

4. Sensitivity of RDGT Hours to Initial MTBF: Test duration is highly sensitive to 
growth rates and initial MTBF values. Obviously if we start out higher on the growth 
curve then test length can be decreased for the same growth rate or the growth rate 
can be decreased (lessor risk) for the same test length. The construct of the highly 
sensitive initial MTBF (Mi), was based on the application of reliability projection 
methodology to fixes implemented for failure modes occurring during PVT several 
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years ago. Actually the initial MTBF value computed represents the reliability 
“potential” of the system or theoretical upper bound of achievable reliability given a 
certain fix effectiveness for fixes implemented and no new failure modes occurring in 
the system (in effect a biased estimate since we never really test long enough to 
discover all the failure modes in a system). The short form equation for calculation of 
reliability potential is given as: 

MTBF (from PVT) 

(2) Potential MTBF = - 

1 - (MS)(FEF) 

where: MS represents the Management Strategy or the percentage of failure 

modes to 

be addressed through corrective action. The MTBF value from PVT is 28. 

The contractor in conjunction with the PMO took an aggressive stance w.r.t. 
corrective action implementation for failure modes occurring during PVT and addressed 
100% of all failure modes (hence, MS goes to 1). Clearly, calculation of the potential 
MTBF (which will be our value for Initial MTBF (Mi) in the formulation of the growth 
test curve) is highly dependent on FEF which in turn impacts growth test length. Figure 
2 provides an illustration of the sensitivity of our initial MTBF (Mi, which is dependent 
on FEF) on the test length needed to satisfy requirement compliance at the 70 percent 
confidence level. 


Sensitivity of Test Hour Required For 
Requirement Demonstration to Initial MTBEFF 

O Required Test Hours 
Highly Sensitive To 
Starting or Initial 
MTBEFF 


O Initial MTBEFF 
Uneertain 

- Effeetiveness of Fix 
from PVT 

- New Design 

O 95% Chosen as Fairly 
Aggressive FEF 



^FEF In Excess of 98% Requires Demonstration Test Not RDGT 


Figure 2. Sensitivity of Test Lengths to Initial MTBE and EEE 

Figure 2 shows the required test hours for various Initial MTBF values computed 
using formula (2). In parenthesis are the corresponding FEF used which produced the 


101 























initial MTBF value; e.g., a 0.90 FEF value corresponds to an Mi of approximately 280 
hours, a 0.95 to our 560 hour Mi etc. Obviously, there is much uncertainty surrounding 
the calculation of this estimate. The fixes implemented appear to be very sound from an 
engineering perspective, but are they really that effective? The historical average FEF 
across all systems is about .7 or 70 percent; i.e., fixes are effective in reducing the failure 
rate of that particular failure mode by 70 percent. Albeit in programs such as Comanche 
an 81% FEF is achieved with some components (electronics) as high as 90-95%. There 
is also uncertainty as to the new design as well. Any FEF below 0.95 will not allow for a 
sufficiently high enough initial MTBF to allow reasonable requirement demonstration 
given our constraints; i.e., will significantly increase test hours. By the same token, if the 
fixes are highly effective (98%) and the design sound then an RDGT would not be 
necessary, only a RDT (Reliability Demonstration Test), at significantly less hours. 
However, this cannot be ascertained and the initial MTBF (Mi) becomes the single most 
critical parameter for this excursion. 

5. Confidence: Figure 3 provides for a sensitivity of confidence versus test duration 
using the values given for Mi, Ti, along with our 0.45 alpha rate. As can be seen by 
figure 1, the cost for additional confidence is increased test hours, which will violate 
our schedule constraints; e.g., to demonstrate at 80 percent confidence will require 
approximately 9,000 test hours. It is felt that the current test length and confidence is 
sufficient for demonstration of the specification requirement of 1380 hours MTBEFF. 
Albeit, this requirement is indicative of the hardware/software reliability of the 
system, it is felt that demonstration with this level of confidence will provide enough 
“slack” to allow for any failure rate attributed to operator/maintainer inducement so 
that the operational requirement in the ORD may be realized. 


BCIS RDGT STRATEGIES 


BCIS RDGT STRATEGIES 



1.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 

% CONFIDENCE REQUIREMENT DEMONSTRATED 


TEST HOURS REQUIRED 
TO DEMONSTRATE 
REQUIREMENT AT 
VARIOUS CONFIDENCE 
LEVELS. 

ADDITIONAL TEST 
HOURS REQUIRED FOR 
GREATER CONFIDENCE 


GIVEN TEST RESOURCES 
OF3 UNITSFOR4MONTHfi 
MAX TIME = 8,640 HRS 
EXPECTED # FAILURES 
= 6, 70% Confidence Demo 
FEASIBLE! 


limimNT=1380HIISII/ITB[FF 


Figure 3. Sensitivity of Confidence Levels To Test Lengths 

6. Risk: Assessing risk relative to an RDGT is dependent upon a number of 
factors; growth rate achievability, ratio of final MTBF to initial MTBF value, and the 
uncertainty surrounding the true initial MTBF (Mi). (1). Growth rate: A growth rate of 
0.45 for continuous time systems is considered very aggressive. The average growth rate 
for developmental time/mileage (continuous) systems range from 0.30-0.30. (2). If the 
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ratio of our final MTBF to our Initial MTBF is greater than six, the program is considered 
high risk (based on AMSAA study). Our ratio is 2163/560 or approximately 3.9. 
Finally, the uncertainty regarding the estimate of our initial MTBF based on a FEF of 
0.95 remains unascertainable. The risk is considered high. 

7. Summary: Contained within is the basis for the construct of a feasible RDGT 
strategy which satisfy the constraints of available test hours, time to implement fixes, and 
availability of test assets. Given the technical aspects, this strategy (figure 1) is feasible, 
but high risk. Given success, it does provide for adequate levels of confidence relative to 
reliability requirement compliance. Additional details need to be worked relative to test 
conduct, mode of equipment operation, and temperature and vibration profiles. 
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APPENDIX C: HUNTER UNMANNED AERIAL VEHICLE ( UAV) 
SYSTEM RELIABILITY GROWTH STORY 

15 Oct 2001 



Figure 1. System MTBF(L) 

ABSTRACT: This paper describes how the Hunter Unmanned Aerial 
Vehicle (UAV) System Mean Time Between Failure (MTBF) grew from 3.6 to 
10.9 hours (Figure 1). To meet the U. S Army’s urgent need for an UAV System, 
the Hunter System integrated existing technology without going through the 
normal development phase. Contract was awarded for Technical Evaluation Test 
(TET) in 1989 followed by a Limited User Test (LUT). Flight Competition 
occurred during 1990-1991 and the Low Rate Initial Production (LRIP) award 
was granted in Feb 1993. 

During system acceptance testing in 1995, several Air Vehicles (AVs) 
were lost, due to various failures resulting in a decision to terminate the follow-on 
production program. However, the Army wanted to benefit as much as possible 
from the substantial investment made, therefore, the UAV-SR Program 
Management Office (PMO) and the TRW Program Office (the system prime 
Contractor) decided to perform an “end to end” Failure Mode Effect and Criticality 
Analysis (FMECA) and a Fishbone Analysis on all the critical subsystems - 
involving subject matter experts from each group, including all the major 
subcontractors. This process identified the root causes, developed technical 
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approaches and implemented corrective actions on the critical issues to get the 
program moving forward, and on a UAV program, that means getting and 
keeping AV’s in the air flying (Figures 2 & 3). 







Figure 3. System Reliability Improvements 

A decision was made to form an alliance with the senior technical 

representatives of TRW, Major Subcontractors, PMO, and the end item U. S. 

Army in an Integrated Process Team (IPT) forum for the following functions: 

• Management IPT (PMO & TRW PM) - Overseer of the subtler IPT’s and its 
effectiveness. 

• Risk Management Council (RMC) IPT - Mitigates flight and safety risks. 

• Failure Review Board (FRB) IPT - Provides visibility of trends. 

• Aviation Safety Council IPT - Focuses on operational safety. 

• Standard Evaluation Board for Operational Procedures (Technical Manuals - 
TM) IPT - Provides continuous updates and real-time information via field 
bulletins. 

• Depot Operations (Supporting Fielded Assets) IPT - Prioritizes the use of 
assets to meet field needs. 

• Engineering IPT (Design issues) - Prioritizes technical issues. 

• Extended IPT (On-site) - Technical support team provided with each system 
deployed, along with a database management system for the users and 
technical support teams to collect failure information. 
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The organizational dynamics in forming the IPTs had dramatic effects. 
Involving the PMO, End Item Users, and Major Subcontractors, with the TRW 
program team in the identification of problems by developing technical 
approaches, setting priorities and designating limited resources, allowed 
everyone to take ownership in the course of actions. The effectiveness was so 
successful that the PMO institutionalized the process by incorporating the IPTs 
into their Statement of Works (SOW) in subsequent years. 

The results of the system performance improvements continued to build 
the customers confidence level. Meeting the user needs and increasing the 
MTBF measurements, have been very significant: 

Deployment to the training base. Fort Huachuca, Az. 1995. 

Deployment to the first operational unit. Fort Hood, Tx.1996. 

National Training Center (NTC) Demonstrations, 1996, 1997, 1998. 

Balkan Wars and Peace keeping forces, 1999, 2000. 

Deployment to Joint Readiness Training Center (JRTC), Fort Polk, La. 

1999. 

Deployment to Interim Brigade Combat Team (IBCT), Fort Lewis, Wa. 

2000 . 

Providing a UAV platform for demonstration of effectiveness and proof of 
concept for U.S. Armed Services Payloads. 

From 1996 through 2000, the Hunter UAV program has developed a very 
satisfied customer community by proving that a reliable UAV system, is in fact, a 
valuable asset to the U. S. Army’s inventory. The Hunter approach to technical 
problems and success is a valuable lesson for any UAV program on customer 
satisfaction and reliability growth. 
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